Click here to Skip to main content
14,031,218 members
Click here to Skip to main content
Add your own
alternative version

Tagged as


Posted 28 Feb 2019
Licenced CPOL

Using the Model Optimizer to Convert Caffe Models

, 28 Feb 2019
How to convert a trained Caffe model using the Model Optimizer with both framework-agnostic and Caffe-specific command-line options

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.


The Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.

The Model Optimizer process assumes you have a network model trained using a supported frameworks. The scheme below illustrates the typical workflow for deploying a trained deep learning model:

A summary of the steps for optimizing and deploying a model that was trained with Caffe*:

  1. Configure the Model Optimizer for Caffe (Caffe was used to train your model).
  2. Convert a Caffe* model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.
  3. Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine validation application or sample applications.
  4. Integrate the Inference Engine in your application to deploy the model in the target environment.

Model Optimizer Workflow

The Model Optimizer process assumes you have a network model that was trained with the Caffe* framework. The workflow is:

  1. Configure Model Optimizer for the Caffe framework by running the configuration bash script for Linux* OS or batch file for Windows* OS from the <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites directory:
    • For Linux* OS:
    • For Windows* OS:

    For details on configuring the Model Optimizer, see Configure the Model Optimizer.

  2. Provide as input a trained model that contains the certain topology, described in the .prototxt file, and the adjusted weights and biases, described in .caffemodel.
  3. Convert the Caffe* model to an optimized Intermediate Representation.

The Model Optimizer produces as output an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine. The Inference Engine API offers a unified API across a number of supported Intel® platforms. The Intermediate Representation is a pair of files that describe the whole model:

  • .xml: Describes the network topology
  • .bin: Contains the weights and biases binary data

Supported Topologies

  • Classification models:
    • AlexNet
    • VGG-16, VGG-19
    • SqueezeNet v1.0, SqueezeNet v1.1
    • ResNet-50, ResNet-101, ResNet-152
    • Inception v1, Inception v2, Inception v3, Inception v4
    • CaffeNet
    • MobileNet
    • Squeeze-and-Excitation Networks: SE-BN-Inception, SE-Resnet-101, SE-ResNet-152, SE-ResNet-50, SE-ResNeXt-101, SE-ResNeXt-50
    • ShuffleNet v2
  • Object detection models:
    • SSD300-VGG16, SSD500-VGG16
    • Faster-RCNN
  • Face detection models:
    • VGG Face
  • Semantic segmentation models:
    • FCN8

NOTE: It is necessary to specify mean and scale values for most of the Caffe* models to convert them with the Model Optimizer. The exact values should be determined separately for each model. For example, for Caffe* models trained on ImageNet, the mean values usually are 123.68, 116.779, 103.939 for blue, green and red channels respectively. The scale value is usually 127.5. Refer to Use framework-agnostic parameters for the information on how to specify mean and scale values.

Convert a Caffe* Model

To convert a Caffe model:

  1. Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory.
  2. Use the script to simply convert a model with the path to the input model .caffemodel file:
    python3 --input_model <INPUT_MODEL>.caffemodel

Two groups of parameters are available to convert your model:

Use Framework-Agnostic Conversion Parameters

To adjust the conversion process, you can use the general (framework-agnostic) parameters:

	optional arguments:
  -h, --help            show this help message and exit
  --framework {tf,caffe,mxnet,kaldi,onnx}
                        Name of the framework used to train the input model.

Framework-agnostic parameters:
                        Tensorflow*: a file with a pre-trained model (binary
                        or text .pb file after freezing). Caffe*: a model
                        proto file with model weights
  --model_name MODEL_NAME, -n MODEL_NAME
                        Model_name parameter passed to the final create_ir
                        transform. This parameter is used to name a network in
                        a generated IR and output .xml/.bin files.
  --output_dir OUTPUT_DIR, -o OUTPUT_DIR
                        Directory that stores the generated IR. By default, it
                        is the directory from where the Model Optimizer is
  --input_shape INPUT_SHAPE
                        Input shape(s) that should be fed to an input node(s)
                        of the model. Shape is defined as a comma-separated
                        list of integer numbers enclosed in parentheses or
                        square brackets, for example [1,3,227,227] or
                        (1,227,227,3), where the order of dimensions depends
                        on the framework input layout of the model. For
                        example, [N,C,H,W] is used for Caffe* models and
                        [N,H,W,C] for TensorFlow* models. Model Optimizer
                        performs necessary transformations to convert the
                        shape to the layout required by Inference Engine
                        (N,C,H,W). The shape should not contain undefined
                        dimensions (? or -1) and should fit the dimensions
                        defined in the input operation of the graph. If there
                        are multiple inputs in the model, --input_shape should
                        contain definition of shape for each input separated
                        by a comma, for example: [1,3,227,227],[2,4] for a
                        model with two inputs with 4D and 2D shapes.
  --scale SCALE, -s SCALE
                        All input values coming from original network inputs
                        will be divided by this value. When a list of inputs
                        is overridden by the --input parameter, this scale is
                        not applied for any input that does not match with the
                        original input of the model.
                        Switch the input channels order from RGB to BGR (or
                        vice versa). Applied to original inputs of the model
                        if and only if a number of channels equals 3. Applied
                        after application of --mean_values and --scale_values
                        options, so numbers in --mean_values and
                        --scale_values go in the order of channels used in the
                        original model.
                        Logger level
  --input INPUT         The name of the input operation of the given model.
                        Usually this is a name of the input placeholder of the
  --output OUTPUT       The name of the output operation of the model. For
                        TensorFlow*, do not add :0 to this name.
  --mean_values MEAN_VALUES, -ms MEAN_VALUES
                        Mean values to be used for the input image per
                        channel. Values to be provided in the (R,G,B) or
                        [R,G,B] format. Can be defined for desired input of
                        the model, for example: "--mean_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --scale_values SCALE_VALUES
                        Scale values to be used for the input image per
                        channel. Values are provided in the (R,G,B) or [R,G,B]
                        format. Can be defined for desired input of the model,
                        for example: "--scale_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --data_type {FP16,FP32,half,float}
                        Data type for all intermediate tensors and weights. If
                        original model is in FP32 and --data_type=FP16 is
                        specified, all model weights and biases are quantized
                        to FP16.
  --disable_fusing      Turn off fusing of linear operations to Convolution
                        Turn off resnet optimization
  --finegrain_fusing FINEGRAIN_FUSING
                        Regex for layers/operations that won't be fused.
                        Example: --finegrain_fusing Convolution1,.*Scale.*
  --disable_gfusing     Turn off fusing of grouped convolutions
  --move_to_preprocess  Move mean values to IR preprocess section
  --extensions EXTENSIONS
                        Directory or a comma separated list of directories
                        with extensions. To disable all extensions including
                        those that are placed at the default location, pass an
                        empty string.
  --batch BATCH, -b BATCH
                        Input batch size
  --version             Version of Model Optimizer
  --silent              Prevent any output messages except those that
                        correspond to log level equals ERROR, that can be set
                        with the following option: --log_level. By default,
                        log level is already ERROR.
  --freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
                        Replaces input layer with constant node with provided
                        value, e.g.: "node_name->True"
                        Force to generate legacy/deprecated IR V2 to work with
                        previous versions of the Inference Engine. The
                        resulting IR may or may not be correctly loaded by
                        Inference Engine API (including the most recent and
                        old versions of Inference Engine) and provided as a
                        partially-validated backup option for specific
                        deployment scenarios. Use it at your own discretion.
                        By default, without this option, the Model Optimizer
                        generates IR V3.

NOTE: Model Optimizer does not revert input channels from RGB to BGR by default as it was in 2017 R3 Beta release. The command line parameter --reverse_input_channels must be specified manually to perform reversion. For details, refer to When to Reverse Input Channels chapter.

The sections below provide details on using particular parameters and examples of CLI commands.

When to Specify Mean and Scale Values

Usually neural network models are trained with the normalized input data. This means that the input data values are converted to be in a specific range, for example, [0, 1] or [-1, 1]. Sometimes the mean values (mean images) are subtracted from the input data values as part of the pre-processing. There are two cases how the input data pre-processing is implemented:

  • The input pre-processing operations are a part of a topology. In this case, the application that uses the framework to infer the topology does not pre-process the input.
  • The input pre-processing operations are not a part of a topology and the pre-processing is performed within the application which feeds the model with an input data.

In the first case, the Model Optimizer generates the IR with required pre-processing layers and Inference Engine samples may be used to infer the model.

In the second case, information about mean/scale values should be provided to the Model Optimizer to embed it to the generated IR. Model Optimizer provides a number of command line parameters to specify them: --scale, --scale_values, --mean_values, --mean_file.

If both mean and scale values are specified, the mean is subtracted first and then scale is applied. Input values are divided by the scale value(s).

There is no a universal recipe for determining the mean/scale values for a particular model. The steps below could help to determine them:

  1. Read the model documentation. Usually the documentation describes mean/scale value if the pre-processing is required.
  2. Open the example script/application executing the model and track how the input data is read and passed to the framework.
  3. Open the model in a visualization tool and check for layers performing subtraction or multiplication (like Sub, Mul, ScaleShift, Eltwise etc) of the input data. If such layers exist, the pre-processing is most probably the part of the model.

When to Specify Input Shapes

There are situations when the input data shape for the model is not fixed, like for the fully-convolutional neural networks. In this case, for example, TensorFlow* models contain -1 values in the shape attribute of the Placeholder operation. Inference Engine does not support input layers with undefined size, so if the input shapes are not defined in the model, the Model Optimizer fails to convert the model.

The solution is to provide the input shape(s) using the --input_shape command line parameter for all inputs of the model or provide the batch size using the -b command line parameter if the model contains just one input with undefined batch size only. In the latter case, the Placeholder shape for the TensorFlow* model looks like this [-1, 224, 224, 3].

When to Reverse Input Channels

Inference Engine samples load input images in BGR channels order. But the model may be trained on images loaded with the RGB channels order. In this case, inference results using the Inference Engine samples will be incorrect. The solution is to provide --reverse_input_channels command-line parameter. Then the Model Optimizer performs first convolution or other channel dependent operation weights modification so these operations output will be like the image is passed with RGB channels order.

Command-Line Interface (CLI) Examples with Framework-Agnostic Parameters

  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with debug log level:
    python3 --input_model bvlc_alexnet.caffemodel --log_level DEBUG
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with the output IR called result.* in the specified output_dir:
    python3 --input_model bvlc_alexnet.caffemodel --model_name result --output_dir /../../models/
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with one input with scale values:
    python3 --input_model bvlc_alexnet.caffemodel --scale_values [59,59,59]
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with multiple inputs with scale values:
    python3 --input_model bvlc_alexnet.caffemodel --input data,rois --scale_values [59,59,59],[5,5,5]
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with multiple inputs with scale and mean values specified for the particular nodes:
    python3 --input_model bvlc_alexnet.caffemodel --input data,rois --mean_values data[59,59,59] --scale_values rois[5,5,5]
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with specified input layer, overridden input shape, scale equal to 5, batch equal to 8, and specified name of an output operation:
    python3 --input_model bvlc_alexnet.caffemodel --input data --input_shape [1,3,224,224] --output pool5 -s 5 -b 8
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with disabled fusing for linear operations to Convolution and grouped convolutions:
    python3 --input_model bvlc_alexnet.caffemodel --disable_fusing --disable_gfusing
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with reversed input channels order between RGB and BGR, specified mean values to be used for the input image per channel, and specified data type for input tensor values:
    python3 --input_model bvlc_alexnet.caffemodel --reverse_input_channels --mean_values [255,255,255] --data_type FP16
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with extensions listed in specified directories, specified mean_images binaryproto. file For more information about extensions, please refer to the Extending the Model Optimizer with New Primitives page.
    python3 --input_model bvlc_alexnet.caffemodel --extensions /home/,/some/other/path/ --mean_file /path/to/binaryproto
  • Launch the Model Optimizer for the Caffe bvlc_alexnet model with a placeholder freezing tensor of values. It replaces the placeholder with a constant layer that contains the passed values.

    Tensor here is represented in square brackets with each value separated from another by a whitespace. If data type is set in the model, this tensor will be reshaped to a placeholder shape and casted to placeholder data type. Otherwise, it will be casted to data type passed to --data_type parameter (by default, it is FP32).

    python3 --input_model FaceNet.pb --freeze_placeholder_with_value "<placeholder_layer_name>->[0.1 1.2 2.3]"

Use Caffe*-Specific Conversion Parameters

The following list provides the Caffe*-specific parameters:

Caffe-specific parameters:
  --input_proto INPUT_PROTO, -d INPUT_PROTO
                        Deploy-ready prototxt file that contains a topology
                        structure and layer attributes
  -k K                  Path to CustomLayersMapping.xml to register custom
  --mean_file MEAN_FILE, -mf MEAN_FILE
                        Mean image to be used for the input. Should be a
                        binaryproto file
  --mean_file_offsets MEAN_FILE_OFFSETS, -mo MEAN_FILE_OFFSETS
                        Mean image offsets to be used for the input
                        binaryproto file. When the mean image is bigger than
                        the expected input, it is cropped. By default, centers
                        of the input image and the mean image are the same and
                        the mean image is cropped by dimensions of the input
                        image. The format to pass this option is the
                        following: "-mo (x,y)". In this case, the mean file is
                        cropped by dimensions of the input image with offset
                        (x,y) from the upper left corner of the mean image
                        Disable omitting optional attributes to be used for
                        custom layers. Use this option if you want to transfer
                        all attributes of a custom layer to IR. Default
                        behavior is to transfer the attributes with default
                        values and the attributes defined by the user to IR.
                        Enable flattening optional params to be used for
                        custom layers. Use this option if you want to transfer
                        attributes of a custom layer to IR with flattened
                        nested parameters. Default behavior is to transfer the
                        attributes without flattening nested parameters.

Command-Line Interface (CLI) Examples with Caffe*-Specific Parameters

  • Launch the Model Optimizer for the bvlc_alexnet.caffemodel with a specified prototxt file. This is needed when the name of the Caffe model and the .prototxt file are different or are placed in different directories. Otherwise, it is enough to provide only the path to the input model.caffemodel file.
    python3 --input_model bvlc_alexnet.caffemodel --input_proto bvlc_alexnet.prototxt
  • Launch the Model Optimizer for the bvlc_alexnet.caffemodel with a specified CustomLayersMapping file. This is the legacy method of quickly enabling model conversion if your model has custom layers. This requires system Caffe* on the computer. To read more about this, see Legacy Mode for Caffe* Custom Layers.

    Optional parameters without default values that are not specified by the user in the .prototxt file are removed from the Intermediate Representation, and nested parameters are flattened:

    python3 --input_model bvlc_alexnet.caffemodel -k CustomLayersMapping.xml --disable_omitting_optional --enable_flattening_nested_params

    This example shows a multi-input model with input layers: data, rois:

    layer {
      name: "data"
      type: "Input"
      top: "data"
      input_param {
        shape { dim: 1 dim: 3 dim: 224 dim: 224 }
    layer {
      name: "rois"
      type: "Input"
      top: "rois"
      input_param {
        shape { dim: 1 dim: 5 dim: 1 dim: 1 }
  • Launching the Model Optimizer for a multi-input model with two inputs and providing a new shape for each input in the order they are passed to the Model Optimizer. In particular, for data, set the shape to 1,3,227,227. For rois, set the shape to 1,6,1,1:
    python3 --input_model /path-to/your-model.caffemodel --input data,rois --input_shape (1,3,227,227),[1,6,1,1]

Custom Layer Definition

Internally, when you run the Model Optimizer, it loads the model, goes through the topology, and tries to find each layer type in a list of known layers. Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in this list of known layers, the Model Optimizer classifies them as custom. For more information about custom layers, refer to Caffe Models with Custom Layers.

Supported Caffe* Layers

Layer Number Layer Name in Caffe Layer Name in the Intermediate Representation
1 Input Input
2 GlobalInput Input
3 InnerProduct FullyConnected
4 Dropout Ingored. Does not appear in the IR
5 Convolution Convolution
6 Deconvolution Deconvolution
7 Pooling Pooling
8 BatchNorm BatchNormalization
9 LRN Norm
10 Power Power
11 ReLU ReLU
12 Scale ScaleShift
13 Concat Concat
14 Eltwise Eltwise
15 Flatten Flatten
16 Reshape Reshape
17 Slice Slice
18 Softmax SoftMax
19 Permute Permute
20 ROIPooling ROIPooling
21 Tile Tile
22 ShuffleChannel Reshape + Split + Permute + Concat
23 Axpy ScaleShift + Eltwise
24 BN ScaleShift

See the Model Optimizer Developer Guide for information about:

  • The Model Optimizer internal procedure for working with custom layers
  • How to convert a model that has custom layers
  • Custom layer implementation details

Frequently Asked Questions (FAQ)

The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the Model Optimizer FAQ. The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong.


In this document, you learned:

  • Basic information about how the Model Optimizer works with Caffe* models
  • Which Caffe* models are supported
  • How to convert a trained Caffe* model using the Model Optimizer with both framework-agnostic and Caffe-specific command-line options

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidia, Pentium, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission by Khronos

*Other names and brands may be claimed as the property of others.

Copyright © 2018, Intel Corporation. All rights reserved.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Intel Corporation
United States United States
You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.

Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers and society.
Group type: Organisation

43 members

Comments and Discussions

-- There are no messages in this forum --
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web05 | 2.8.190419.4 | Last Updated 28 Feb 2019
Article Copyright 2019 by Intel Corporation
Everything else Copyright © CodeProject, 1999-2019
Layout: fixed | fluid