- *click to see videos*

Fast hand template matching	Fast traffic sign template matching

Contents

Introduction
Background
Template matching algorithm
Using the code
Detecting custom objects
Results
Conclusion
References

Introduction

NuGet packages: Algorithms.LINE2D
Help: Off-line - Algorithms.LINE2D - unblock after download!

In computer vision applications, a frequent task is object detection and localization. Object detection approaches can be divided into three groups: hand-crafted methods which consist of some predefined rules and heuristics, machine learning based approaches where object information is encoded into classifier, and the third approach is something between - template matching.

Although the machine-learning based approaches are shown to be generally the best, there are some cases where they are not desirable. One reason is that the training procedures usually require many training samples which gathering is not always an easy task.The training procedure usually takes a long time and the trained classifier cannot be tweaked on the spot.

Template matching procedures are simple and can be robust enough to deal with complex scenes and they do not require extensive training. Moreover, superposed template gives the object appearance which is not possible with popular machine-learning sliding window approaches, e.g. Viola-Jones. This ability can be used for accurate object localization as standalone procedure or as post-processing technique (e.g. after Viola-Jones technique). Recently in 2012 a new template matching procedure is proposed Hinterstoisser et al. [1] which is roughly 20 times faster than standard sliding window template matching approaches. This procedure (slightly improved) is now available as part of Accord.NET Extensions Framework and will be explained here briefly along with samples and usage scenarios. Due to the complexity of the procedure, the detailed explanation is omitted but it can be found in the original article.

Background

The standard template matching technique takes each template and compares it with an image part in sliding window manner. The procedure can be compared as convolution process with a big kernel which obviously can take some time. The template can be represented in many ways: as color image, gradient magnitude image, orientation image. It has been shown that gradient orientations are quite robust against illumination changes, thus the gradient orientation image representation is frequently used. The novel procedure appeared in 2012 [1] also uses gradient orientation image as input. The different part is the template matching step where the naive convolution is done by exploiting hardware and CPU structure. The final result is fast and robust algorithm for template matching which can be used in many applications; here: hand detection and traffic sign detection.

Template Matching Algorithm

Template matching procedure can be divided into five steps which will be briefly described below. For full explanation of the procedure, see [1].

Template matching pipeline. Template matching pipeline. The red rectangle denotes the extended part.

Gradient Extraction

The crucial preprocessing step is edge extraction, where an edge orientation image is an input image for the template matching method [1]. One of the main reasons that image orientations are chosen is their robustness to extensive light changes and image noise. Gradient magnitude performs poorly in scenes with background clutter due to many false positives. In [1], a multi-channel Sobel operator has been used where a pixel orientation is taken from the channel that has the largest magnitude. In addition, orientations are filtered in a way that only dominant orientations in a 3x3 neighborhood are retained. The implemented method supports the described approach as well the traditional orientation extraction from the grayscale images for the less computation demanding method, which part is shown below.
```
short* dxPtr, dyPtr = ... //input dX and dY Sobel derivative images
int* magSqrPtr = ... //output gradient magnitude image
int* orientImagePtr = ... //output orientation image
for (int j = 0; j < imgHeight; j++)
{
    for (int i = 0; i < imgWidth; i++)
    {
        int magSqr = dxPtr[0] * dxPtr[0] + dyPtr[0] * dyPtr[0];
        if (magSqr < minValidMagSqr)
            *orientImgPtr = FeatureMap.INVALID_ORIENTATION;
        else
        {
            *orientImgPtr = MathExtensions.Atan2Aprox(*dyPtr, *dxPtr);
            *magSqrImgPtr = magSqr;
        }
        dxPtr += 1; dyPtr += 1;
        orientImgPtr += 1; magSqrImgPtr += 1;
    }
    ...
}
```

Gradient orientation quantization

Each gradient orientation is quantized into maximum 8 directions in order display each orientation as a bit flag. The representation can be easily manipulated. The code below shows that quantization procedure uses pre-calculated lookup table for angle quantization.

int* orientDegImgPtr = ... //orientation image pointer
int imgWidth, imgHeight = ... //image width and height
byte[] AngleQuantizationTable = ... //maps [0-360] -> [...] -> [0-7]
for (int j = 0; j < imgHeight; j++)
{
    for (int i = 0; i < imgWidth; i++)
    {
        int angle = orientDegImgPtr[i];
        qOrinetUnfilteredPtr[i] = AngleQuantizationTable[angle]; 
    }
    
    ...
}

Gradient orientation spreading

Image objects can be deformed due to physical characteristics or due to noise which results in non-perfect edge alignment between object and corresponding template. In order to increase robustness orientations are spread on the local neighborhood. Since each orientation can be written as single bit, the spread orientation is simply bitwise OR between some pixel and its neighbor pixels as shown below.

byte* srcImgPtr = ... //input quantized orientation image
byte* destImgPtr = ... //output spread quantized orientattion image
int neighborhood = ... //spread factor
for (int row = 0; row < neghborhood; row++)
{
    int subImageHeight = imgHeight - row;
    for (int col = 0; col < neghborhood; col++)
    {
       //get shifted image and 
       //do the bitwise OR operation for each pixel
        OrImageBits(&srcImgPtr[col], destImgPtr,
                    imgStride,
                    imgWidth - col, subImageHeight);
    }
    ...
}

Building response maps and linear memories

The template is matched against every location within the image using the operation equivalent to sliding a template over the input image. In contrast to the standard template matching procedure, the input image is preprocessed so that the matching procedure can be executed very fast by adding long arrays - linear memories. See [1] for details. Each linear memory cell contains similarity in range [0..n] where n is maximum user-specified similarity between two quantized orientations. These values are limited by the number of bits in 8-bit value. The linear memory addition can be done very efficiently by using SIMD processor architecture which enables real-time template matching.

//calculate linear response maps
private Image<Gray<byte>>[][,] calculate(Gray<int>[,] orientationDegImg)
{
    //the number of linear memories is equal to the number of quantized orientations
    var linearMaps = new Image<Gray<byte>>[GlobalParameters.NUM_OF_QUNATIZED_ORIENTATIONS][,];

    //first create spread quantized orientation image
    Gray[,] sprededQuantizedOrient = FeatureMap.Calculate(orientationDegImg, this.NeigborhoodSize);
    //for each quantized orientation calculate pre-calculate similarity image 
    //(response map) between the orientation image and specified orientation
    for (int orient = 0; orient < GlobalParameters.NUM_OF_QUNATIZED_ORIENTATIONS; orient++)
    {
        //...and finally make linear memory for each response map 
        //by taking every Tth pixel and putting them into one array
        var responseMap = computeResponseMap(sprededQuantizedOrient, orient);
        linearMaps[orient] = linearizeResponseMap(responseMap);
    }

    return linearMaps;
}

Matching procedure

Each template feature, which consists of a location and its corresponding quantized orientation is used to select linear memory (8-bit vector). The selected memories are added to the 8-bit result vector initialized by zeros. The result vector contains the raw template matching scores in range [0..255] for each location. The values are rescaled to the range [0..100]. The result vector is thresholded in order to extract only strong matches. For details about the linear memory content and creation, see [1]. SIMD array addition instructions are written in a separate project as shown below.

SIMD array instructions. SIMD array addition instructions are in separate C project. The fast memory addition is the crucial step for the real-time template matching procedure.

Functions are exported through C# by using P/Invoke.

//byte-to-byte array addition
[SuppressUnmanagedCodeSecurity] //add some performance by skipping some checks
[DllImport("SIMDArrayInstructions.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)]
public static extern void AddByteToByteVector(byte* srcAddr, byte* dstAddr, int numOfElemsToAdd);
//byte-to-short array addition
[SuppressUnmanagedCodeSecurity]
[DllImport("SIMDArrayInstructions.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)]
public static extern void AddByteToShortVector(byte* srcAddr, short* dstAddr, int numOfElemsToAdd);

Feature number increase - extension

The 8-bit result vector dictates the maximum number of features per template because each linear cell contains value which maximum is the maximum similarity between two quantized orientations - n and the number of linear memories being added corresponds to the number of template features. Therefore, the maximum number of features per template, in the original paper, is limited to ⌊255 / n⌋. To overcome this limitation, the linear maps are first added to a temporary buffer - 8 bit vector, and then before the buffer overflows, the buffer is added to the final result array (16-bit), which contains template matching scores. This procedure is repeated for each template feature. The buffer is cleared after its values are accumulated to result array. See the figure below for details.

Linear memory addition.

The result is the increased maximum number of features ⌊65535/n⌋. The performance comparison shows the significant speed-up over traditional template matching approach. n denotes the maximum similarity between two features and the F denotes the number of features.

Pyramidal Extension

This approach can be extended to use pyramidal approach as shown in original paper. The linear memory representation is built for each pyramid level and the matching procedure is executed for the smallest image and for each patch of the lower pyramid levels which contain candidates. This approach provides speedup and more accurate object localization. The implemented code supports this feature but only one level (original level) is used.

Pyramidal search. Template matching procedure is executed for each pyramid level. The image of the higher pyramid level is whole scanned. Here is shown the patch of 5x5 pixels. If the candidate is found on the higher level pyramid image, the patch is rescaled well as template (done off-line through template building) and the matching procedure is repeated but this time for the candidate patches.

Using the Code

Using the code is very simple. The implementation consists of a single method for image preprocessing and of an extension method for template matching. By default, all parameters have their default values so the user does not have to know how the algorithm works. Parallel template matching is enabled by default, which means more CPU cores - faster processing. More options are available through linear memory extension methods.

Gray<byte>[,] grayImage = .... 
List<TemplatePyramid> templatePyramids = ... //off-line made templates

var linPyr = LinearizedMapPyramid.CreatePyramid(grayImage); //prepare linear-pyramid maps
List<Match> matches = linPyr.MatchTemplates(templPyrs); //match templates

foreach(var m in matches) //draw matches
{
   frame.Draw(m.BoundingRect, Bgr8.Blue, 3, true, Bgr8.Red);
}

linPyr.Dispose();

NuGet package

Although the implementation uses the unmanaged libraries, the provided NuGet package is platform abstract - for now (x86 /x64 - Windows). Depending on the running OS architecture, the appropriate unmanaged library is loaded so the developer does not need to handle multiple OS architectures separately. The image below shows the pre-compiled library locations.

Pre-compiled libraries. Unmanaged libraries are pre-compiled and loaded on demand depending on OS architecture.

Detecting Custom Objects

In order to detect custom objects, binary templates which describe object outer contour must be made. Building your own templates does not take much effort either except making a template. The template building process can be divided into three steps and will be explained on open hand sample.

Prepare templates

Each template is presented as binary image. Hand templates were obtained in a more complicated way. Each open hand sample is recorded with a color camera against some uniform background and then KNN algorithm with k=2 is applied in order to remove the uniform background. Simple templates can be made easily by using some drawing application - e.g. Microsoft Paint.

Template samples. Three samples for the open hand template. There are few template variations which are then resized to achieve scale invariance.
Rescale templates

In order to detect objects regardless of a size, each template must be rescaled. Necessary MATLAB (Octave) scripts are enclosed in the sample project. Softening the edges helps in feature extraction process so it is recommended to apply Gaussian snooting on the binary image templates.

Script locations. MATLAB (Octave) Scripts for template rescaling are enclosed with the sample project
Extracting features

The feature extraction process and template creation includes only one function.
```
List<TemplatePyramid> templatePyrs = new List<TemplatePyramid>()
foreach(var preparedBWImage in bwImages)
{
   var tp = TemplatePyramid.CreatePyramidFromPreparedBWImage(preparedBWImage, "OpenHand");
   templatePyrs.Add(tp);
}
```
Feature samples. The extracted features from the template are denoted with the green color. The red dots denote the feature positions which are scattered uniformly.

Results

The result of this algorithm is the significant speed-up over traditional template matching approach.

Speedup over traditional matching procedure. Traditional and speeded-up template matching performance comparison. The number of features per template is 100. All tests are conducted on Intel Core i5-2500K CPU using single core.

The following two video links represents the hand template matching demo which sample is included and the traffic sign shape detection which uses this approach as the base.

Samples - *click to see videos*

Fast hand template matching	Fast traffic sign template matching

Conclusion

This article presents the fast template matching method which can be used for fast and simple object detection. The code contains the complete source as well as open hand detection sample adjustable for other object types. The source and sample code are the part of Accord.NET Extensions Framework, a framework that brings many advanced algorithms primarily for image processing, object detection and tracking, all packed as fluent extensions and simple and intuitive generics, so do not forget to take a peek :).

References

[1]	Hinterstoisser, S.; Cagniart, C.; Ilic, S.; Sturm, P.; Navab, N.; Fua, P.; Lepetit, V., "Gradient Response Maps for Real-Time Detection of Textureless Objects," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.34, no.5, pp.876,888, May 2012 Free-access version

History

5^th October, 2014 - First version released