Click here to Skip to main content
Click here to Skip to main content

Rapid Object Detection in .NET

, 25 Jan 2014
Rate this:
Please Sign up or sign in to vote.
Rapid object detection without a third pary library requirement in .Net.
  1. Project Homepage 
  2. HaarCascadeClassifier.dll and examples 

Introduction 

The most popular and the fastest implementation of Viola-Jones object detection algorithm is undoubtedly the implementation of OpenCV. But OpenCV requires wrapper classes to be usable with .NET languages and Bitmap objects of .NET have to be converted to IplImage format before it is used with OpenCV. On the other hand, programs that use OpenCV are dependent on all OpenCV libraries and their wrappers. These are not problems when functions of OpenCV are used. But if we need only to detect objects on a Bitmap, it isn't worth to make our programs dependent on all OpenCV libraries and wrappers...

I have written a library (HaarCascadeClassifier.dll) that makes object detection possible in .NET without any other library requirement. It is an Open Source project that contains implementation of the Viola-Jones object detection algorithm. The library uses haar cascades generated by OpenCV (XML files) to detect particular objects such as faces.

It can be used for object detection purposes or only to understand the algorithm and how parameters affect the result or the speed.

Background 

In fact, my purpose is to share HaarCascadeClassifier.dll and its usage. So I will try to summarize the algorithm.

Algorithm of Viola and Jones doesn't use pixels directly to detect objects. It uses rectangular features that are called "haar-like features". These features can be represented using 2, 3, or 4 rectangles.  

Haar-like Features

The value of a haar-like feature is calculated as the difference between the average of the pixels under the rectangle that covers the feature and the average of the pixels under black rectangles. Each rectangle is represented with 5 values; x, y, width, height, and weight. x and y are coordinates where the rectangle will be placed on the searching window. Width and height are the size of the rectangle. Weight is multiplied by the sum of all the pixels insinde the rectangle. (Therefore we can say that weight values of black rectangles are negative.)

Cascades and Stages

A cascade of weak classifiers are used to determine if a subwindow on an image contains the searched object. The structure of a cascade can be represented programmatically as follows:

' Feature rectangle
Public Structure FeatureRect
   Public Rectangle As Rectangle
   Public Weight As Single
End Structure

' Binary tree nodes
Public Structure Node
   ' Feature rectangles
   Public FeatureRects As List(Of FeatureRect)
   
   ' Threshold for determining what to select (left value/right value) or where to go on
   ' binary tree (left or right)
   Public Threshold As Single
   
   ' Left value
   Public LeftVal As Single

   ' Right value
   Public RightVal As Single
   
   ' Does this node have a left node? (Checking a boolean takes less time then to control
   ' if left node is null or not.)              
   Public HasLNode As Boolean
   
   ' Left node. If the current node doesn't have a left node, this will be null.
   Public LeftNode As Integer
   
   ' Does this node have a right node?
   Public HasRNode As Boolean
   
   ' Right node. If the current node doesn't have a right node, this will be null.
   Public RightNode As Integer
End Structure

' Will be used as a binary tree
Public Structure Tree
   Public Nodes As List(Of Node)
End Structure

' Stages
Public Structure Stage
   ' Trees in the stage.
   Public Trees As List(Of Tree)
   
   ' Threshold of the stage.
   Public Threshold As Single
End Structure

' Stages of the cascade
Public Stages As List(Of Stage)

Features are applied to a subimage until the subimage is rejected by one of them or passed from all.

The value of a node is calculated as the sum of all the features inside the node and it is used to determine the next node or the value that will be added to the stage value. If the current node is a leaf (it means it has no nodes) and the calculated node value is less than the threshold, then the left value (else right value) is added to the stage value (RightVal, LeftVal). If the current node is not a leaf, features on the next node (if the feature value is smaller than the threshold LeftNode, else RightNode) that is selected according to the value of the current node have to be applied to the same subimage. After it is done for each tree in the current stage, the calculated stage value is compared with the stage threshold. If it is less, it means the subimage doesn't contain the searched object and it is eliminated directly without applying the remaining stages. 

Integral Image 

Integral image is an approach that speeds up the algorithm. Integral image at a location is the sum of all pixels above and on the left of the location. (CalculateCumSums32bpp(), CalculateCumSums24bpp(), CalculateCumSums8bpp()). It is used to calculate the sum of pixels within a rectangle very rapidly. 

For example, the sum of pixels under the hatched rectangle can be calculated as IntegralImage(D) - IntegralImage(B) - IntegralImage(C) + IntegralImage(A).

Variance Normalization

During the training process, variance of each image is normalized to reduce the effect of different lighting conditions. Therefore, normalization must be done for all subwindows during detection. Also thresholds can be normalized instead of normalizing image variance (this is faster). Thresholds are multiplied by standard deviation of pixels within the current subwindow for normalization.

Scanning Process  

The scanning process is an important part of the algorithm. Allmost all parameters required by the Detect() function of the library are used to optimize this process.

During the scanning process, classifiers are applied to all subimages using a scaled searching window. (Features are also scaled.)

While Scaler < MaxScale
   .... 
   Scaler = Scaler * ScaleMult
End While 

The step size of the searching window is calculated by multiplying SlidingRatio and the current searching window width. It means each step of the searching window is equal to StepSize pixels.

For i = 0 To ImageWidth - SearchingWinWidth - 1 Step StepSize
   For j = 0 To ImageWidth - SearchingWinHeight - 1 Step StepSize
      ...
   Next
Next

Average of Nested Rectangles

Objects can be detected at different scales more than once. This causes the rectangles that represent locations of objects to be nested. 

The average of these rectangles must be calculated and all the nested rectangles must be eliminated from results that will be returned. 

If distances between edges of rectangles are short enough, they are considered to be nested. The destination threshold is obtained by multiplying sizeMultForNesRectCon by the size of one of these rectangles (AreTheyNested()). 

Using the Library  

The library is easy to use in all .NET languages. An example is given below in C#. More examples can be downloaded with compiled executables from here

Parameters

Parameter Description
MaxDetCount The detector stops searching when number of detected objects reaches this value.
MinNRectCount Minimum neighbour areas that have to be passed from the detector to verify existence of searched object.
FirstScale First scaler of searching window.
MaxScale Maximum scaler of searching window.
ScaleMult ScaleMult (Scale Multiplier) and the current scaler are multiplied to make an increment on the current scaler.
SizeMultForNesRectCon SizeMultForNesRectCon (Size Multiplier For Nested Object Control) and size of each rectangle are multiplied separately to obtain maximum acceptable horizontal and vertical distances between the current rectangle and the others. Maximum distances are used to check if rectangles are nested or not.
SlidingRatio The ratio of step size to scaled searching window width. (CurrentStepSize = ScaledWindowWidth * SlidingRatio)
Pen A "Pen" object to draw rectangles on given bitmap. If it is given as null, nothing will be drawn.

An Example

Preparing the required parameters:

int maxDetCount = Int32.MaxValue;
int minNRectCount = 0;
float firstScale = detector.Size2Scale(100);
float maxScale = detector.Size2Scale(400);
float scaleMult = 1.1f;
float sizeMultForNesRectCon = 0.3f;
float slidingRatio = 0.2f;
Pen pen = new Pen(Brushes.Red, 4);
HaarCascadeClassifer.HaarDetector.DetectionParams detectorParameters;
detectorParameters = 
    new HaarCascadeClassifer.HaarDetector.DetectionParams(maxDetCount, minNRectCount, 
    firstScale, maxScale, scaleMult, sizeMultForNesRectCon, slidingRatio, pen);

Loading an embedded XML haar cascade:

XmlDocument xmlDoc=new XmlDocument();
xmlDoc.LoadXml(HaarCascadeClassifer.My.Resources.Resources.haarcascade_frontalface_alt);
HaarDetector detector = new HaarDetector(xmlDoc);

Loading a haar cascade from an external XML file:

XmlDocument xmlDoc=new XmlDocument();
xmlDoc.Load("haarcascade_frontalface_alt.xml");
HaarDetector detector = new HaarDetector(xmlDoc);

Detection:

Bitmap bmp = new Bitmap("C:\bitmap.bmp");
HaarCascadeClassifer.HaarDetector.DResults results = detector.Detect(ref bmp, detectorParameters);

"results" contains the following: 

results.NOfObjects               // Number of detected objects
results.DetectedOLocs            // Locations of detected objects (Rectangles)
results.SearchedSubRegionCount   // Number of searched subregions

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0

Share

About the Author

Huseyin Atasoy
Engineer
Turkey Turkey
I am interested in artifical intelligence, signal processing, electronics, robotics and internet/network technologies.
 
A few of my works:
PAKEV
Real-time object recognition
Real-time facial expression recognition
Real-time face recognition
Javascript encrypter
 
My blog

Comments and Discussions

 
QuestionHow to train for other detection tasks? Pinmembercwienands2-Oct-12 6:58 
Hello Huseyin,
 
I always love scientific stuff in .Net Smile | :)
 
Question: You state "the library uses haar cascades generated by OpenCV (XML files)". From what I know about Haar classifiers, you have to train them. Is there an OpenCV-based tool available that you used to train for this .Net tool? Or did you build something yourself? Because if I want to detect anything other than faces, that's the missing puzzle piece.
 
Thanks for the great work,
 
Christoph
AnswerRe: How to train for other detection tasks? PinmemberHuseyin Atasoy2-Oct-12 8:35 
GeneralRe: How to train for other detection tasks? Pinmembercwienands2-Oct-12 17:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 25 Jan 2014
Article Copyright 2012 by Huseyin Atasoy
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid