6 Tools from Intel to Speed up Deep Neural Networks

Stephan Ofosuhene

0/5 (0 vote)

Jan 10, 2019

CPOL

8013

This article takes a look at a variety of tools available from Intel: Intel® Movidius™ Neural Compute Stick, Intel® Python Distribution for Python™, Intel® Math Kernel DNN Library, Intel® Data Analytics Acceleration Library, Intel Distribution of OpenVINO™ Toolkit

Deep neural networks (DNNs) are powerful, and popular for building machine learning applications. They are particularly useful for tasks such as image recognition and vision tasks.

By their nature, DNNs tend to consume a great deal of computing resources. Optimizing their performance is thus important for ensuring that they generate results in the time required, and for minimizing the costs of acquiring and maintaining the infrastructure required to run DNNs.

Fortunately, several tools are available for DNN optimization. This article takes a look at a variety of tools available from Intel:

These tools range in nature from hardware to software and are designed to fit into existing workflows by using libraries that are popular with the machine learning community, rather than requiring new ones to be created from scratch.

This article provides an overview of how each tool works, and why you’d want to use it.

Intel Movidius Neural Compute Stick

The Intel Movidius Neural Compute Stick is a USB stick with specialized hardware for running neural networks. The device contains Streaming Hybrid Architecture Vector Engine (SHAVE) processors and VPUs, combined with built-in RAM that reduces the bandwidth required because data can be handled locally. The device is aimed at computer vision and AI tasks and has been used successfully in the Ryze Tello drone.

The device is used to train the model on the cloud, after which the model is transferred onto it for the purpose of making predictions. The model can then be tweaked using the SDK to adjust its performance. Making predictions and adjustments can be done offline, thereby reducing bandwidth requirements.

The Intel Movidius Neural Compute Stick is ideal for embedded applications due to its low power consumption. Additionally, its ability to work offline makes it reliable in situations where bandwidth is limited. Its ability to also work with existing libraries like Tensorflow and Caffe means it can fit easily into existing development workflows.

The prerequisite for using the Intel Movidius Neural Compute Stick is a trained neural network model in either Tensorflow or Caffe. The model can then be downloaded onto the stick using the Movidius SDK. You can find instructions on how to access the SDK in this video.

Intel® Distribution of Python™

Intel® Distribution of Python™ is an Anaconda-based Python distribution that contains popular libraries used for mathematics and machine learning (ML), such as NumPy, SciPy, scikit-learn, etc. What distinguishes this distribution from others is that the mathematical libraries (such as NumPy and scikit-learn) are optimized for Intel architectures.

These optimizations are done with Intel’s performance libraries, such as the Intel® Math Kernel Library and Intel® Data Analytics Acceleration Library. It allows access to the latest vectorization and multithreading instructions, which also helps speed up your Python code.

Using this distribution of Python does not require any change to your existing code because the interfaces of your favorite math and ML libraries remain unchanged. This means you can simply change your interpreter to Intel’s distribution and enjoy the benefits of their optimizations.

Take advantage of this free download now. Alternatively, you can install the individual Intel libraries using pip. The following are package names of Intel’s Python implementation of mathematical and ML libraries.

Package Name	pip command	Platform Availability
numpy	pip install intel-numpy	Linux, Win, macOS (10.12)
scipy	pip install intel-scipy
scikit-learn	pip install intel-scikit-learn
pydaal	pip install pydaal
tbb4py	pip install tbb4py

Source: https://software.intel.com/en-us/articles/installing-the-intel-distribution-for-python-and-intel-performance-libraries-with-pip-and

You can find more information about installing Intel’s Python packages on their website.

Intel^® Math Kernel Library

As mentioned above, Intel provides the Intel® Math Kernel Library that is optimized for Intel architectures. These libraries are aimed at speeding up mathematics and data analytics, specifically on Intel processors. Intel has also introduced the Intel Math Kernel Library DNN (Intel MKL-DNN) for increasing the performance of DNNs. This library is implemented in C++ and can be integrated directly into your application or used from other high-level programming languages such as Python.

Just like Intel’s Python implementation, Intel’s MKL-DNN Library provides a way to speed up DNNs on Intel processors. This is ideal for reducing the time required for DNN experiments.

Download this library for FREE today.

Intel^® Data Analytics Acceleration Library

Intel’s math libraries also include the Intel Data Analytics Acceleration Library that is aimed at speeding up data analytics. Just like the MKL-DNN, it is implemented in C++, is optimized for Intel processors, and can also be called from other high programming languages. This makes it easy to integrate into your applications, or to simply add it to your data analytics toolbox.

This is a useful library for quickly getting an overview of data that is used for DNNs, and can easily be added to your workflow by using it from your favorite high-level programming language. Additionally, note that this library is integrated in Intel’s Python implementation; therefore, you do not need to use it directly if you are using Python. This library can be downloaded for free from Intel’s website.

Intel® Distribution of OpenVINO™ toolkit

The Intel Open Visual Inference and Neural network Optimization (OpenVINO) toolkit is a toolkit designed for computer vision. This library can be used to emulate human vision using convolutional neural networks (CNNs) which are a type of DNN.

This tool makes it possible to execute your CNNs on a heterogeneous collection of devices, including Intel CPUs, FPGAs, and the Intel Movidius Neural Compute Stick. This means the workload of a CNN can be distributed across several devices of different types. This feature speeds up the inference of CNNs, allowing for real time inference.

The key component of the toolkit, the Intel Deep Learning Deployment Toolkit, is specifically designed to ease the process of deploying your trained model across a variety of Intel platforms and accelerators. The toolkit also provides optimized calls to OpenCV and OpenVX. This allows you to interface easily with existing libraries that may already be in your workflow. You can access this tool on Intel’s website.

Choose your best download option here.

BigDL: Distributed Deep Learning on Apache Spark

Intel also provides a tool for speeding up deep learning using distributed computing on Apache Spark with its BigDL library. This library allows you to write your DNN applications as standard Spark programs which can run on Spark and Hadoop clusters. The library, which is modeled after Torch, provides rich, DNN-like numeric computing via Tensor and high-level neural networks. Pre-trained neural networks written in Caffe or Torch can also be loaded onto Spark via the BigDL library.

The BigDL library also uses the Intel MKL library and multithreaded programming to speed up DNNs. This makes the BigDL library orders of magnitude faster than out-of- the-box Torch or Caffe models. The library also allows you to efficiently scale out to perform data analytics on Big Data. You can access the BigDL library from this GitHub repository.

Conclusion

As deep neural networks become more and more important within production-level machine-learning applications, finding ways to optimize DNN performance will be crucial. Fortunately, a number of tools are available for making DNNs more efficient, even if you’ve already completed your DNN architecture and code. Changes as simple as using an optimized Python implementation or specialized hardware can significantly improve DNN performance and make DNNs practical to deploy on a large scale.