Here we will cover a few key concepts as well as a description of the problem that ONNX solves. This piece will serve as an on-ramp for those new to machine learning and deep learning.
Pre-trained neural networks are everywhere. No matter what problem you’re trying to solve, there’s a good chance someone else has already trained a neural network to do it. And if someone has already spent time and money training a model, why reinvent the wheel.
Collections like the ONNX Model Zoo make it easy to find your next top model. But what if you find a model that’s not in the format you’d like to use? What we really need is portable neural networks - and that’s exactly what the ONNX format provides. This series will examine how to convert common AI model formats to the ONNX format - and then use them in your applications. When you’re done reading, you’ll have a complete understanding of how to use portable neural networks in 2020.
To understand ONNX and the value it provides you must first understand the problem that it solves. If you have tinkered with machine learning algorithms and created a model that only you yourself consume (from the same project in which the model was created) then you have not come across the machine learning problem. If, on the other hand, you have DevOps responsibilities for your current project and you have had to productionize a model created by a team of data scientists then you have come across the machine learning problem.
The Problem with Models
The Machine Learning problem is caused by the fact that today, models must be served using the same language and framework that was used to create the model. For example, when you create a neural network using Python and Keras and you are ready to deploy that model to your production environment, the service that runs the model will need to be written in Python and you will need to have both Python and Keras installed in your production environment.
In a microservices environment, this may not sound like a big deal — create an image with Python and Keras installed within it, deploy the image to a container, and serve your predictions via a RESTful API. However, maybe you don’t have any Python experts on your team, or maybe your engineering workflow is optimized for one language and does not have a lot of quality gates (static code analysis, unit tests, end to end tests, and security scans) for Python. These two scenarios are very common in larger enterprises that have standardized on either C# or Java. Having engineers who are not familiar with Python or Keras deploy and maintain services that make predictions is not ideal. Also, deploying services without all the proper quality checks will only introduce failures into the entire application.
An ideal solution would be to have a common format for models that would allow models of all types — traditional machine learning models as well as neural networks — to be consumed from any language.
The Problem with Runtimes
The machine learning problem is not just about the lack of model interoperability between languages. Serving models in either batch mode or real-time can be compute-intensive.
The ideal solution would be to have a runtime that is accessible from every programming language and capable of serving models in a way that is optimized for the underlying hardware.
This article is the first in a series of seven articles in which we will explore the value of ONNX with respect to three popular frameworks and three popular programming languages.
The next three articles will cover the creation of ONNX models from three popular frameworks for building neural networks. Since ONNX does not provide capabilities for training models, I will provide a brief overview of each framework so that you can pick the best one for training your models. Each framework has a different package that needs to be installed to convert to ONNX. This will be covered for each. Finally, there are a few unique gotchas to watch out for when exporting models from each framework.
The last three articles will demonstrate how to serve ONNX models from Python, C#, and Java. Similar to the articles preceding them, they will explore some of the unique issues for each language.
Each article stands on its own. So, for instance, if you know that your data science team uses PyTorch and that the trained models will need to be consumed from Java, then you can jump straight to reading the two articles on PyTorch and Java and you will be all set.
What would an article series on ONNX be without a great set of code samples? There is a companion repository on Github that contains a working copy of all the code shown in this series. I intend to continually improve these demos as well as add more demos and samples that come from the ONNX Model Zoo, so check back on this repository frequently.