Click here to Skip to main content
13,252,065 members (56,084 online)
Click here to Skip to main content
Add your own
alternative version

Stats

83.3K views
102 bookmarked
Posted 8 Oct 2014

Rapid Object Detection in C#

, 24 Nov 2014
Rate this:
Please Sign up or sign in to vote.
Fast object detection by template matching
Video icon. - click to see videos
Fast hand template matching Fast traffic sign template matching
Fast hand template matching
Fast traffic sign template matching

Contents

  1. Introduction
  2. Background
  3. Template matching algorithm
  4. Using the code
  5. Detecting custom objects
  6. Results
  7. Conclusion
  8. References

Introduction

In computer vision applications, a frequent task is object detection and localization. Object detection approaches can be divided into three groups: hand-crafted methods which consist of some predefined rules and heuristics, machine learning based approaches where object information is encoded into classifier, and the third approach is something between - template matching.

Although the machine-learning based approaches are shown to be generally the best, there are some cases where they are not desirable. One reason is that the training procedures usually require many training samples which gathering is not always an easy task.The training procedure usually takes a long time and the trained classifier cannot be tweaked on the spot.

Template matching procedures are simple and can be robust enough to deal with complex scenes and they do not require extensive training. Moreover, superposed template gives the object appearance which is not possible with popular machine-learning sliding window approaches, e.g. Viola-Jones. This ability can be used for accurate object localization as standalone procedure or as post-processing technique (e.g. after Viola-Jones technique). Recently in 2012 a new template matching procedure is proposed Hinterstoisser et al. [1] which is roughly 20 times faster than standard sliding window template matching approaches. This procedure (slightly improved) is now available as part of Accord.NET Extensions Framework and will be explained here briefly along with samples and usage scenarios. Due to the complexity of the procedure, the detailed explanation is omitted but it can be found in the original article.

Background

The standard template matching technique takes each template and compares it with an image part in sliding window manner. The procedure can be compared as convolution process with a big kernel which obviously can take some time. The template can be represented in many ways: as color image, gradient magnitude image, orientation image. It has been shown that gradient orientations are quite robust against illumination changes, thus the gradient orientation image representation is frequently used. The novel procedure appeared in 2012 [1] also uses gradient orientation image as input. The different part is the template matching step where the naive convolution is done by exploiting hardware and CPU structure. The final result is fast and robust algorithm for template matching which can be used in many applications; here: hand detection and traffic sign detection.

Template Matching Algorithm

Template matching procedure can be divided into five steps which will be briefly described below. For full explanation of the procedure, see [1].

Template matching.
Template matching pipeline. Template matching pipeline. The red rectangle denotes the extended part.
  1. Gradient Extraction

    The crucial preprocessing step is edge extraction, where an edge orientation image is an input image for the template matching method [1]. One of the main reasons that image orientations are chosen is their robustness to extensive light changes and image noise. Gradient magnitude performs poorly in scenes with background clutter due to many false positives. In [1], a multi-channel Sobel operator has been used where a pixel orientation is taken from the channel that has the largest magnitude. In addition, orientations are filtered in a way that only dominant orientations in a 3x3 neighborhood are retained. The implemented method supports the described approach as well the traditional orientation extraction from the grayscale images for the less computation demanding method, which part is shown below.

    short* dxPtr, dyPtr = ... //input dX and dY Sobel derivative images
    int* magSqrPtr = ... //output gradient magnitude image
    int* orientImagePtr = ... //output orientation image
    for (int j = 0; j < imgHeight; j++)
    {
        for (int i = 0; i < imgWidth; i++)
        {
            int magSqr = dxPtr[0] * dxPtr[0] + dyPtr[0] * dyPtr[0];
            if (magSqr < minValidMagSqr)
                *orientImgPtr = FeatureMap.INVALID_ORIENTATION;
            else
            {
                *orientImgPtr = MathExtensions.Atan2Aprox(*dyPtr, *dxPtr);
                *magSqrImgPtr = magSqr;
            }
            dxPtr += 1; dyPtr += 1;
            orientImgPtr += 1; magSqrImgPtr += 1;
        }
        ...
    }
  2. Gradient orientation quantization

    Each gradient orientation is quantized into maximum 8 directions in order display each orientation as a bit flag. The representation can be easily manipulated. The code below shows that quantization procedure uses pre-calculated lookup table for angle quantization.

    int* orientDegImgPtr = ... //orientation image pointer
    int imgWidth, imgHeight = ... //image width and height
    byte[] AngleQuantizationTable = ... //maps [0-360] -> [...] -> [0-7]
    for (int j = 0; j < imgHeight; j++)
    {
        for (int i = 0; i < imgWidth; i++)
        {
            int angle = orientDegImgPtr[i];
            qOrinetUnfilteredPtr[i] = AngleQuantizationTable[angle]; 
        }
        
        ...
    }
  3. Gradient orientation spreading

    Image objects can be deformed due to physical characteristics or due to noise which results in non-perfect edge alignment between object and corresponding template. In order to increase robustness orientations are spread on the local neighborhood. Since each orientation can be written as single bit, the spread orientation is simply bitwise OR between some pixel and its neighbor pixels as shown below.

    byte* srcImgPtr = ... //input quantized orientation image
    byte* destImgPtr = ... //output spread quantized orientattion image
    int neighborhood = ... //spread factor
    for (int row = 0; row < neghborhood; row++)
    {
        int subImageHeight = imgHeight - row;
        for (int col = 0; col < neghborhood; col++)
        {
           //get shifted image and 
           //do the bitwise OR operation for each pixel
            OrImageBits(&srcImgPtr[col], destImgPtr,
                        imgStride,
                        imgWidth - col, subImageHeight);
        }
        ...
    }
  4. Building response maps and linear memories

    The template is matched against every location within the image using the operation equivalent to sliding a template over the input image. In contrast to the standard template matching procedure, the input image is preprocessed so that the matching procedure can be executed very fast by adding long arrays - linear memories. See [1] for details. Each linear memory cell contains similarity in range [0..n] where n is maximum user-specified similarity between two quantized orientations. These values are limited by the number of bits in 8-bit value. The linear memory addition can be done very efficiently by using SIMD processor architecture which enables real-time template matching.

    //calculate linear response maps
    private Image<Gray<byte>>[][,] calculate(Gray<int>[,] orientationDegImg)
    {
        //the number of linear memories is equal to the number of quantized orientations
        var linearMaps = new Image<Gray<byte>>[GlobalParameters.NUM_OF_QUNATIZED_ORIENTATIONS][,];
    
        //first create spread quantized orientation image
        Gray<byte>[,] sprededQuantizedOrient = FeatureMap.Calculate(orientationDegImg, this.NeigborhoodSize);
        //for each quantized orientation calculate pre-calculate similarity image 
        //(response map) between the orientation image and specified orientation
        for (int orient = 0; orient < GlobalParameters.NUM_OF_QUNATIZED_ORIENTATIONS; orient++)
        {
            //...and finally make linear memory for each response map 
            //by taking every Tth pixel and putting them into one array
            var responseMap = computeResponseMap(sprededQuantizedOrient, orient);
            linearMaps[orient] = linearizeResponseMap(responseMap);
        }
    
        return linearMaps;
    }</byte>
  5. Matching procedure

    Each template feature, which consists of a location and its corresponding quantized orientation is used to select linear memory (8-bit vector). The selected memories are added to the 8-bit result vector initialized by zeros. The result vector contains the raw template matching scores in range [0..255] for each location. The values are rescaled to the range [0..100]. The result vector is thresholded in order to extract only strong matches. For details about the linear memory content and creation, see [1]. SIMD array addition instructions are written in a separate project as shown below.

    SIMD in project.
    SIMD array instructions. SIMD array addition instructions are in separate C project. The fast memory addition is the crucial step for the real-time template matching procedure.

    Functions are exported through C# by using P/Invoke.

    //byte-to-byte array addition
    [SuppressUnmanagedCodeSecurity] //add some performance by skipping some checks
    [DllImport("SIMDArrayInstructions.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)]
    public static extern void AddByteToByteVector(byte* srcAddr, byte* dstAddr, int numOfElemsToAdd);
    //byte-to-short array addition
    [SuppressUnmanagedCodeSecurity]
    [DllImport("SIMDArrayInstructions.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)]
    public static extern void AddByteToShortVector(byte* srcAddr, short* dstAddr, int numOfElemsToAdd);

    Feature number increase - extension

    The 8-bit result vector dictates the maximum number of features per template because each linear cell contains value which maximum is the maximum similarity between two quantized orientations - n and the number of linear memories being added corresponds to the number of template features. Therefore, the maximum number of features per template, in the original paper, is limited to ⌊255 / n⌋. To overcome this limitation, the linear maps are first added to a temporary buffer - 8 bit vector, and then before the buffer overflows, the buffer is added to the final result array (16-bit), which contains template matching scores. This procedure is repeated for each template feature. The buffer is cleared after its values are accumulated to result array. See the figure below for details.

    Linear memory addition.

    The result is the increased maximum number of features ⌊65535/n⌋. The performance comparison shows the significant speed-up over traditional template matching approach. n denotes the maximum similarity between two features and the F denotes the number of features.

Pyramidal Extension

This approach can be extended to use pyramidal approach as shown in original paper. The linear memory representation is built for each pyramid level and the matching procedure is executed for the smallest image and for each patch of the lower pyramid levels which contain candidates. This approach provides speedup and more accurate object localization. The implemented code supports this feature but only one level (original level) is used.

Pyramidal search
Pyramidal search. Template matching procedure is executed for each pyramid level. The image of the higher pyramid level is whole scanned. Here is shown the patch of 5x5 pixels. If the candidate is found on the higher level pyramid image, the patch is rescaled well as template (done off-line through template building) and the matching procedure is repeated but this time for the candidate patches.

Using the Code

Using the code is very simple. The implementation consists of a single method for image preprocessing and of an extension method for template matching. By default, all parameters have their default values so the user does not have to know how the algorithm works. Parallel template matching is enabled by default, which means more CPU cores - faster processing. More options are available through linear memory extension methods.

Gray<byte>[,] grayImage = .... 
List<TemplatePyramid> templatePyramids = ... //off-line made templates

var linPyr = LinearizedMapPyramid.CreatePyramid(grayImage); //prepare linear-pyramid maps
List<Match> matches = linPyr.MatchTemplates(templPyrs); //match templates

foreach(var m in matches) //draw matches
{
   frame.Draw(m.BoundingRect, Bgr8.Blue, 3, true, Bgr8.Red);
}

linPyr.Dispose();

NuGet package

Although the implementation uses the unmanaged libraries, the provided NuGet package is platform abstract - for now (x86 /x64 - Windows). Depending on the running OS architecture, the appropriate unmanaged library is loaded so the developer does not need to handle multiple OS architectures separately. The image below shows the pre-compiled library locations.

Pre-compiled libraries. Unmanaged libraries are pre-compiled and loaded on demand depending on OS architecture.

Detecting Custom Objects

In order to detect custom objects, binary templates which describe object outer contour must be made. Building your own templates does not take much effort either except making a template. The template building process can be divided into three steps and will be explained on open hand sample.

  1. Prepare templates

    Each template is presented as binary image. Hand templates were obtained in a more complicated way. Each open hand sample is recorded with a color camera against some uniform background and then KNN algorithm with k=2 is applied in order to remove the uniform background. Simple templates can be made easily by using some drawing application - e.g. Microsoft Paint.

    Template samples
    Template samples. Three samples for the open hand template. There are few template variations which are then resized to achieve scale invariance.
  2. Rescale templates

    In order to detect objects regardless of a size, each template must be rescaled. Necessary MATLAB (Octave) scripts are enclosed in the sample project. Softening the edges helps in feature extraction process so it is recommended to apply Gaussian snooting on the binary image templates.

    Scripts in project.
    Script locations. MATLAB (Octave) Scripts for template rescaling are enclosed with the sample project
  3. Extracting features

    The feature extraction process and template creation includes only one function.

    List<TemplatePyramid> templatePyrs = new List<TemplatePyramid>()
    foreach(var preparedBWImage in bwImages)
    {
       var tp = TemplatePyramid.CreatePyramidFromPreparedBWImage(preparedBWImage, "OpenHand");
       templatePyrs.Add(tp);
    }
    Template features samples.
    Feature samples. The extracted features from the template are denoted with the green color. The red dots denote the feature positions which are scattered uniformly.

Results

The result of this algorithm is the significant speed-up over traditional template matching approach.

performance compairision
Speedup over traditional matching procedure. Traditional and speeded-up template matching performance comparison. The number of features per template is 100. All tests are conducted on Intel Core i5-2500K CPU using single core.

The following two video links represents the hand template matching demo which sample is included and the traffic sign shape detection which uses this approach as the base.

Samples Video icon. - click to see videos
Fast hand template matching Fast traffic sign template matching
Fast hand template matching
Fast traffic sign template matching

Conclusion

This article presents the fast template matching method which can be used for fast and simple object detection. The code contains the complete source as well as open hand detection sample adjustable for other object types. The source and sample code are the part of Accord.NET Extensions Framework, a framework that brings many advanced algorithms primarily for image processing, object detection and tracking, all packed as fluent extensions and simple and intuitive generics, so do not forget to take a peek :).

References

[1] Hinterstoisser, S.; Cagniart, C.; Ilic, S.; Sturm, P.; Navab, N.; Fua, P.; Lepetit, V., "Gradient Response Maps for Real-Time Detection of Textureless Objects," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.34, no.5, pp.876,888, May 2012
Free-access version

History

  • 5th October, 2014 - First version released

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

Share

About the Author

Darko Jurić
Software Developer Faculty of Electrical Engineering and Computing
Croatia Croatia
Darko Jurić received his BSc and MSc degree from the Faculty of Electrical Engineering and Computing in Zagreb. He finished his undergraduate study program in Computing in 2010 and in 2012 his master study program in Computer Science with a specialization in Image processing and analysis. Currently, he is employed at the Faculty of Electrical Engineering and Computing, University of Zagreb, as a research engineer at the Department of Electronic Systems and Information Processing within Image Processing Group, where he is pursuing his PhD degree. His research interests are in the areas of image processing and machine learning which is used for solving problems in computer vision. Author of the Accord.NET Extensions Framework.

You may also be interested in...

Comments and Discussions

 
GeneralMy vote of 5 Pin
Gun Gun Febrianza9-Jan-17 10:23
member Gun Gun Febrianza9-Jan-17 10:23 
QuestionGet opencv error when I change pictures Pin
kmjacky4-Nov-16 10:09
memberkmjacky4-Nov-16 10:09 
AnswerRe: Get opencv error when I change pictures Pin
Darko Jurić27-Nov-16 3:17
memberDarko Jurić27-Nov-16 3:17 
QuestionUsing this method for robotics navigation Pin
Nawang Lama4-Feb-16 21:36
memberNawang Lama4-Feb-16 21:36 
AnswerRe: Using this method for robotics navigation Pin
Darko Jurić15-Feb-16 5:57
memberDarko Jurić15-Feb-16 5:57 
QuestionUsing Accord.Extensions and detect object on Android Pin
Member 1153783414-Sep-15 23:31
memberMember 1153783414-Sep-15 23:31 
AnswerRe: Using Accord.Extensions and detect object on Android Pin
Darko Jurić15-Sep-15 2:19
memberDarko Jurić15-Sep-15 2:19 
GeneralRe: Using Accord.Extensions and detect object on Android Pin
Member 1153783415-Sep-15 19:01
memberMember 1153783415-Sep-15 19:01 
GeneralRe: Using Accord.Extensions and detect object on Android Pin
Darko Jurić16-Sep-15 3:56
memberDarko Jurić16-Sep-15 3:56 
GeneralRe: Using Accord.Extensions and detect object on Android Pin
Member 1153783417-Sep-15 2:54
memberMember 1153783417-Sep-15 2:54 
GeneralRe: Using Accord.Extensions and detect object on Android Pin
Darko Jurić17-Sep-15 5:32
memberDarko Jurić17-Sep-15 5:32 
QuestionHow to best pre-process templates? Pin
Member 1192457920-Aug-15 19:22
memberMember 1192457920-Aug-15 19:22 
AnswerRe: How to best pre-process templates? Pin
Darko Jurić27-Aug-15 10:55
memberDarko Jurić27-Aug-15 10:55 
GeneralRe: How to best pre-process templates? Pin
Dương Bảo24-Feb-16 5:33
memberDương Bảo24-Feb-16 5:33 
GeneralRe: How to best pre-process templates? Pin
Darko Jurić2-Mar-16 4:01
memberDarko Jurić2-Mar-16 4:01 
QuestionWhat about the rotation invariant object detection? Pin
Member 1172172725-Jun-15 16:37
memberMember 1172172725-Jun-15 16:37 
AnswerRe: What about the rotation invariant object detection? Pin
Darko Jurić18-Jul-15 2:02
memberDarko Jurić18-Jul-15 2:02 
QuestionAccord.Net is free to use in commercial application Pin
Tridip Bhattacharjee27-Apr-15 22:08
professionalTridip Bhattacharjee27-Apr-15 22:08 
AnswerRe: Accord.Net is free to use in commercial application Pin
Darko Jurić28-Apr-15 2:16
memberDarko Jurić28-Apr-15 2:16 
GeneralRe: Accord.Net is free to use in commercial application Pin
Tridip Bhattacharjee28-Apr-15 4:20
professionalTridip Bhattacharjee28-Apr-15 4:20 
GeneralRe: Accord.Net is free to use in commercial application Pin
Darko Jurić28-Apr-15 11:10
memberDarko Jurić28-Apr-15 11:10 
GeneralMy vote of 5 Pin
Daniel Udell11-Dec-14 22:35
memberDaniel Udell11-Dec-14 22:35 
GeneralRe: My vote of 5 Pin
Darko Jurić12-Dec-14 2:51
memberDarko Jurić12-Dec-14 2:51 
QuestionUsing Multiple Pyramid Levels Pin
Daniel Udell11-Dec-14 4:21
memberDaniel Udell11-Dec-14 4:21 
AnswerRe: Using Multiple Pyramid Levels Pin
Darko Jurić11-Dec-14 17:32
memberDarko Jurić11-Dec-14 17:32 
GeneralRe: Using Multiple Pyramid Levels Pin
Daniel Udell11-Dec-14 22:33
memberDaniel Udell11-Dec-14 22:33 
GeneralRe: Using Multiple Pyramid Levels Pin
Darko Jurić12-Dec-14 2:48
memberDarko Jurić12-Dec-14 2:48 
QuestionGood Pin
Yves25-Nov-14 13:55
memberYves25-Nov-14 13:55 
AnswerRe: Good Pin
Darko Jurić26-Nov-14 2:00
memberDarko Jurić26-Nov-14 2:00 
SuggestionMentor comments Pin
AJSON29-Oct-14 14:51
mentorAJSON29-Oct-14 14:51 
GeneralRe: Mentor comments Pin
Darko Jurić29-Oct-14 23:20
memberDarko Jurić29-Oct-14 23:20 
AnswerRe: Mentor comments Pin
AJSON30-Oct-14 0:49
mentorAJSON30-Oct-14 0:49 
QuestionMultiple objects detection Pin
DomLee29-Oct-14 0:26
memberDomLee29-Oct-14 0:26 
AnswerRe: Multiple objects detection Pin
Darko Jurić29-Oct-14 0:49
memberDarko Jurić29-Oct-14 0:49 
GeneralMy vote of 5 Pin
DomLee28-Oct-14 23:43
memberDomLee28-Oct-14 23:43 
GeneralRe: My vote of 5 Pin
Darko Jurić29-Oct-14 0:44
memberDarko Jurić29-Oct-14 0:44 
Questionfeasible image size Pin
Member 855555010-Oct-14 5:08
memberMember 855555010-Oct-14 5:08 
AnswerRe: feasible image size Pin
Darko Jurić10-Oct-14 5:20
memberDarko Jurić10-Oct-14 5:20 
GeneralRe: feasible image size Pin
Member 855555010-Oct-14 6:04
memberMember 855555010-Oct-14 6:04 
QuestionWhat about CUDA or GPU process in this library Pin
Cadenza8-Oct-14 23:50
memberCadenza8-Oct-14 23:50 
AnswerRe: What about CUDA or GPU process in this library Pin
Darko Jurić9-Oct-14 0:21
memberDarko Jurić9-Oct-14 0:21 
GeneralRe: What about CUDA or GPU process in this library Pin
Cadenza9-Oct-14 1:35
memberCadenza9-Oct-14 1:35 
GeneralRe: What about CUDA or GPU process in this library Pin
Darko Jurić9-Oct-14 1:42
memberDarko Jurić9-Oct-14 1:42 
QuestionVery good !!! Pin
Nguyen.H.H.Dang8-Oct-14 16:16
professionalNguyen.H.H.Dang8-Oct-14 16:16 
AnswerRe: Very good !!! Pin
Darko Jurić8-Oct-14 22:00
memberDarko Jurić8-Oct-14 22:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.171114.1 | Last Updated 24 Nov 2014
Article Copyright 2014 by Darko Jurić
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid