Click here to Skip to main content
15,789,776 members
Articles / Internet of Things

Object Recognition using Inference and Single Shot MultiBox Detector (SSD)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
25 Jan 2018CPOL8 min read 12.1K   2   1
This tutorial will walk you through the basics of using the Deep Learning Deployment Toolkit's Inference Engine (included in the Intel® Computer Vision SDK).

This article is for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers

This tutorial will walk you through the basics of using the Deep Learning Deployment Toolkit's Inference Engine (included in the Intel® Computer Vision SDK). Here, inference is the process of using a trained neural network to infer meaning from data (e.g., images). In the code sample that follows, a video (frame by frame) is fed to the Inference Engine (our trained neural network) which then outputs a result (classification of an image). Inference can be done using various neural network architectures (AlexNet*, GoogleNet*, etc.). This example uses a Single Shot MultiBox Detector (SSD) on GoogleNet model. For an example of how SSD is used see this article on the Intel® Developer Zone.

The Inference Engine requires that the model be converted to IR (Intermediate Representation) files. This tutorial will walk you through the basics taking an existing model (GoogleNet) and converting it to IR (Intermediate Representation) files using the Model Optimizer.

So what's different about running a neural network on the Inference Engine versus an out of the box framework?

  • The Inference Engine optimizes inference allowing a user to run deep learning deployments significantly faster on Intel® architecture. For more information on the performance on Intel® Processor Graphics see this article
  • Inference can run on hardware other than the CPU such as the built-in Intel® GPU or Intel® FPGA accelerator card.

What you’ll Learn

  • How to install the OpenCL™ Runtime Package
  • How to install the Intel® Computer Vision SDK
  • How to generate the .bin and .xml (IR files) needed for the Inference Engine from a Caffe model
  • Run the Inference Engine using the generated IR files in a C++ application
  • Compare the performance of CPU vs GPU

Gather your materials

  • 5th or greater Generation Intel® Core™ processor. You can find the product name in Linux* by running the ‘lscpu’ command. The ‘Model name:’ contains the information about the processor.

Note: The generation number is embedded into the product name, right after the ‘i3’, ‘i5’, or ‘i7’. For example, the Intel® Core™ i5-5200U processor and the Intel® Core™ i5-5675R processor are both 5th generation, and the Intel® Core™ i5-6600K processor and the Intel® Core™ i5 6360U processor are both 6th generation.

  • Ubuntu* 16.04.3 LTS
  • In order to run inference on the integrated GPU:
    • A processor with Intel® Iris® Pro graphics or HD Graphics
    • No discrete graphics card installed (required by the OpenCL™ platform). If you have one, make sure to disable it in BIOS before going through this installation process.
    • No drivers for other GPUs installed, or libraries built with support for other GPUs


Install OpenCL™ Runtime Package and other dependencies

In order to run inference on the GPU, you need to first install the OpenCL™ Runtime package. These commands install the OpenCL™ Runtime package, as well as some package dependencies required by the Intel® CV SDK.

Note: These steps are for Ubuntu 16.04.3 or later. If you have a version older than 16.04.3, then you need to still install the Package dependencies below, then skip down to the Install Intel® CV SDK section for instructions on installing the OpenCL™ Driver.

Intel® CV SDK Package dependencies:

sudo apt-get update
sudo apt-get install build-essential ffmpeg cmake checkinstall pkg-config yasm libjpeg-dev curl imagemagick gedit mplayer unzip libpng12-dev libcairo2-dev libpango1.0-dev libgtk2.0-dev libgstreamer0.10-dev libavcodec-dev libavformat-dev

OpenCL™ Runtime package:

unzip -d SRB5.0_linux64
cd SRB5.0_linux64
sudo apt-get install xz-utils
mkdir intel-opencl
tar -C intel-opencl -Jxf intel-opencl-r5.0-63503.x86_64.tar.xz
tar -C intel-opencl -Jxf intel-opencl-devel-r5.0-63503.x86_64.tar.xz
tar -C intel-opencl -Jxf intel-opencl-cpu-r5.0-63503.x86_64.tar.xz
sudo cp -R intel-opencl/* /
sudo ldconfig

Install Intel® CV SDK

  1. Go to
  2. Register, then wait for a confirmation email. It can take several hours to get the email. So go take a break and come back once you're received the email.

If you've already registered for the Intel® CV SDK before, you should get access to download almost immediately.

  1. From the link in the email, download the Ubuntu* package

  2. Unzip the contents (to a folder in your directory of choice)

tar zxvf intel_cv_sdk_ubuntu_r3_2017.1.163.tgz

Note: If you are running a version older than Ubuntu 16.04.3, then to install the OpenCL™ Driver you need to run the script in the downloaded folder before running the Intel® CV SDK installation. This script can take over half an hour to complete. It will re-build the kernel with the updated driver. Your computer will restart through the process. Make sure to backup your data before running this script. If you prefer not to continue with this kernel re-build, then we recommend you install Ubuntu 16.04.3 or later which only needs a few files installed (instructions above) and does not need a kernel re-build.

  1. In the cv sdk folder:
cd intel_cv_sdk_ubuntu_r3_2017.1.163/

Enter super user mode

sudo su

Then run the installation wizard


and follow the instructions.

Get the code

Clone this repository

Note: Make sure to exit super user mode before cloning this repository.

git clone

Install Caffe

Caffe is required to convert a Caffe model using the Model Optimizer. This script will install a version of Caffe to the opt/intel/ssdcaffe folder that is compatible with the Model Optimizer.

From the computer-vision-inference-tutorials/1-run-model-optimizer directory

cd computer-vision-inference-tutorials/1-run-model-optimizer

Enter in a terminal:

sudo su

source /opt/intel/computer_vision_sdk_2017.1.163/bin/


This should take somewhere between 10 and 20 minutes depending on your system.

Generate the .bin and .xml (IR files) for the Inference Engine

The Caffe model consists of two files: SSD_GoogleNetV2_Deploy.prototxt and SSD_GoogleNetV2_Deploy.caffemodel. You will convert them to IR files by running the Model Optimizer using the script.

First download the files:

wget -O SSD_GoogleNetV2_caffe.tgz && mkdir SSD_GoogleNetV2_caffe && tar -xvzf SSD_GoogleNetV2_caffe.tgz -C SSD_GoogleNetV2_caffe

Go back into super user mode and run:

sudo su
source /opt/intel/computer_vision_sdk_2017.1.163/bin/
python -w SSD_GoogleNetV2_caffe/SSD_GoogleNetV2.caffemodel -d SSD_GoogleNetV2_caffe/SSD_GoogleNetV2_Deploy.prototxt

Verify the creation of the IR files

Look in the /artifacts/VGG_VOC0712_SSD_300x300_deploy folder.

cd artifacts/VGG_VOC0712_SSD_300x300_deploy

Check that VGG_VOC0712_SSD_300x300_deploy.xml and VGG_VOC0712_SSD_300x300_deploy.bin were created in that directory

Make sure to exit super user mode before continuing exit

Run the Inference Engine using the IR files in a C++ application

Make sure to exit super user mode before building the application exit

First set the paths:

source /opt/intel/computer_vision_sdk_2017.1.163/bin/

Then build:


Note: If you get an error related to "undefined reference to 'google::FlagRegisterer...", try uninstalling libgflags-dev: sudo apt-get remove libgflags-dev

Before running, download the test video file to a new videos directory:

wget -P videos/

To run:

./IEobjectdetection -i videos/vtest.avi -fr 200 -m artifacts/VGG_VOC0712_SSD_300x300_deploy/VGG_VOC0712_SSD_300x300_deploy.xml -d CPU -l pascal_voc_classes.txt

You should see a video play with people walking across and red bouding boxes around them.

You should also see the output in the console showing the objects found and the confidence level.

Here is what the flags mean for running the application. This can also be found by running:

./IEobjectdetection -help
-h           Print a usage message
-i <path>    Required. Path to input video file
-fr <path>   Number of frames from stream to process
-m <path>    Required. Path to IR .xml file.
-l <path>    Required. Path to labels file.
-d <device>  Infer target device (CPU or GPU)
-t <type>    Infer type (SSD, etc)
-pc          Enables per-layer performance report
-thresh <val>confidence threshold for bounding boxes 0-1
-b <val>     Batch size

In this sample video, there are approximately 790 frames, so by setting the number of frames to 790 you'll see the whole video.

-fr 790

If you change the threshold of the confidence level to 0.1, you'll see a lot more bounding boxes around the people, but also many false ones.

-thresh 0.1
./IEobjectdetection -i videos/vtest.avi -fr 200 -m SSD_GoogleNet_v2_fp32.xml -d CPU -l pascal_voc_classes.txt -thresh 0.1

Compare the performance of CPU vs GPU


You can enable the output of performance data to the console by using the -pc flag.

./IEobjectdetection -i videos/vtest.avi -fr 200 -m SSD_GoogleNet_v2_fp32.xml -d CPU -l pascal_voc_classes.txt -pc

You'll see the Total time it took to run.


Since you installed the OpenCL™ drivers to use the GPU, you can try running inference on the GPU and compare the difference.


Make sure to source your environment variables first:

source /opt/intel/computer_vision_sdk_2017.1.163/bin/

Using the GPU is set by this flag

-d GPU
./IEobjectdetection -i videos/vtest.avi -fr 200 -m SSD_GoogleNet_v2_fp32.xml -d GPU -l pascal_voc_classes.txt -pc

The Total time between CPU and GPU will vary on your system.

How it works

How does the Inference Engine work?

The Inference Engine takes a representation of a neural network model and optimizes it to take advantage of advanced Intel® instruction sets in the CPU, and also makes it compatible with the other hardware accelerators (GPU and FPGA). To do this, the model files (e.g., .caffemodel, .prototxt) are given to the Model Optimizer which then processes the files and outputs two new files: a .bin and .xml. These two files are used instead of the original model files when you run your application. In this example, the .bin and .xml files are provided.

In the above diagram, IR stands for Intermediate Representation, which is just a name for the .xml and .bin files that are inputs to the Inference Engine.

When you want to use another model in the Inference Engine, refer to this tutorial on how to get the necessary files using the Model Optimizer.

If you run into issues, don't hesitate to contact us on the forum

High level steps in the code:

  • Check arguments
  • Video pre-processing
  • Load model to the Inference Engine
  • Run Inference
  • Parse the results
  • Render the frame

Video pre-processing

The video pre-processing step takes the current frame, and resizes it using the cv2.resize() function,,

It also converts it to a planar format (by default it is in packed format). Packed format is arranged like RGBRGBRGB, planar is arranged like RRRGGGBBB.

This format is required by the Inference Engine because that is the format used by the Intel® Math Kernel Library (Intel® MKL). If you want to know more see and

Loading a model into the Inference Engine

There are three steps to loading a model into the Inference Engine: load the plugin, read the network and load the model into the plugin

Load the plugin

This is done by using the InferenceEnginePluginPtr _plugin() function.

Read the network

The network object is initialized

InferenceEngine::CNNNetReader network;

and then a network is read using the ReadNetwork() function


FLAGS_m means the -m flag, which in this case is SSD_GoogleNet_v2_fp32.xml

The weights are then added to the network


Note: The .xml file contains the network, and the .bin file contains the weights.

Load the model into the plugin

Refer to the code for how to setup the input and blobs.

Run inference

Inference is ran using

_plugin->Infer(inputBlobs, outputBlobs, &dsc)

Note: The image size is resized based on the expected size in the model .xml file for every fram before inference is ran.

Next Steps

For more information on how to generate IR files from Caffe and TensorFlow models see:

If you run into issues, don't hesitate to ask on our forum


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
United States United States
You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.

Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers and society.
This is a Organisation

42 members

Comments and Discussions

QuestionWindows Pin
Bernhard Hiller2-Feb-18 5:13
Bernhard Hiller2-Feb-18 5:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.