Click here to Skip to main content
15,860,844 members
Articles / Multimedia / GDI+

Computer Vision Application with C# - Part III

Rate me:
Please Sign up or sign in to vote.
4.97/5 (26 votes)
9 May 2009CPOL9 min read 115.3K   9K   98   34
Meanshift tracking implementation

Image 1

Introduction

Moving on from Part II, this article presents a tracking algorithm. The Centroid tracker presented in Part II was not actually a tracking algorithm but a fancy trick. All the "tracker" was doing was to find the centroid of the backprojected image and put a bounding box around it - this is not tracking by definition. So this article presents a very well known tracking algorithm in Computer Vision community called Meanshift Tracking. This article is based on a series of research papers by Comaniciu Et Al. [1],[2],[3].

Moving on, visual tracking can be described as identifying an object in a video frame and then tracking that object in subsequent frames. It does not include the detection of the object. Tracking assumes that the object is selected either manually or automatically. In this article, we'll use manual object selection. Once the object is selected, some kind of model(s) is created based on some attribute(s) and then using the model(s), the object is tracked in subsequent frames. Meanshift, as described in the mentioned research papers, uses colour as the attribute and histogram for modeling this attribute. We have already covered these pre-requisites in Part I.

Meanshift Tracking - High Level View

Given an object, we create a colour histogram - let's call this target histogram. When we get the next frame, we try to find an object with a similar colour histogram in the vicinity of the object in the previous frame. Once we find this object, we stop looking for it and wait to get another frame. This process is repeated over and over again which results in tracking. At this point, there are two questions one must ask:

  1. How do we know that we have found our object?
  2. How do we search the vicinity of the object?

The answer to the first question is that we use some kind of histogram matching, such as Bhattacharyya Coefficient (see Part I). So we create the histogram of our possible object and match it to our target histogram. If it is close enough, we assume it is the same object; if not, we make a move in the vicinity and create a histogram and do a match again till we find a close enough object.

The way of looking in the vicinity is a technique from which the algorithm gets its name: Meanshift. Meanshift is the mode seeking algorithm. In other words, given a probability distribution a.k.a. normalised histogram, the Meanshift algorithm seeks the peak of this histogram.

Basic Meanshift Tracking

We first pick the attribute(s) of the object to model the object. We are using colour as the attribute and the normalised histogram as our model. So in the video frame, we select the target object by placing a bounding box, a.k.a. ROI (Region of Interest), around it. Assuming everything, including the background, in the bounding box belongs to the target, we create the normalised colour histogram.

Now we grab the next frame and place our bounding box at the same location as the previous frame and create a candidate normalised colour histogram.

Next comes the Meanshift bit. The Meanshift algorithm iterations consists of steps 3 to 5 below:

  1. Create a target model Image 2 in the ROI in the form of a histogram based on its feature - most commonly it is colour. Normalise the target histogram.
  2. In the next image frame, create a candidate model Image 3 histogram at the location "x" of the target in the previous image. Normalise the candidate histogram.
  3. Each pixel "I" in the ROI around "x" provides a weighted sample. Suppose its colour is in bin u of the histogram, then the weight contributed by a pixel is given by: Image 4
  4. Update the meanshift vector: Image 5
  5. Update the current position and iterate: Image 6

The effect of the above steps is that an object is tracked in a sequence of frames. The weights are important, since otherwise the samples are simply at the pixel locations and therefore uniformly distributed. The Meanshift algorithm works because the weighted ratio (marked in red above) is biased towards a mode because colours are more likely to match in the correct direction. Hence, the weights will on average be greater closer to the true location. Don't be put off by the Mathematics - here is what is going on:

Image 7

Image 8

The table below shows the Meanshift calculations. X and Y are the coordinates, RGB is the pixel colour value, Bin is the histogram index corresponding to the RGB colour value, Image 9 is the value of the target histogram bin, and Image 10 is the value of the candidate histogram bin:

Image 11

The new centroid is found as shown below. We keep on moving the centroid till we find the likely target, i.e., using the Bhattacharyya coefficient to match the histogram.

Image 12

An improvement to this basic Meanshift tracking proposed by Comaniciu Et Al. is that once we find the new centroid, we create a histogram of the image under the ROI and find the Bhattacharyya coefficient (let's call it BC1). We move the centroid in small increments and create a histogram. Then as long as the Bhattacharyya coefficient at the moved centroid is less than BC1, we keep on inching the centroid towards the target. Furthermore, we also keep track of how much the centroid has moved from the previous position and the number of Meanshift iterations. The centroid movement is found by using the Euclidean distance between the previous and current centroids. If the centroid movement and the number of Meanshift iterations are less than some threshold, we keep on running the Meanshift algorithm. Using the threshold is important for practical purposes because if the tracker gets lost, we don't want to be stuck in an infinite loop!

Kernel-based Meanshift Tracking

In kernel-based Meanshift, a kernel profile is overlaid on the ROI and the model is created based on the kernel. The tracker proposed by Comaniciu Et Al [3] uses the Epanichnikov kernel shown below. Because of the shape of this kernel, the pixels towards the centre of the ROI would have a greater value than the pixels towards the edges, which gives the tracker the ability to reject interfering backgrounds.

Image 13

There is one key difference between a kernel-based Meanshift and basic Meanshift. When the target or candidate model is created in a kernel-based Meanshift tracker, a suitable kernel is used in the ROI and the value of the kernel corresponding to the colour pixel value is used to create the histogram instead of the actual colour value. The process of creating the model, i.e., histogram, in the kernel-based Meanshift tracker is the same as usual, expect that instead of incrementing the corresponding bin by 1, the value of the kernel at that location is added to the bin. For example, if a pixel colour value falls in bin 3 and the kernel value at the pixel is 0.843, then we would add 0.843 to bin 3 instead of incrementing the bin value. Using the Epanichnikov kernel, the bin value "k" will look something like this (the equations in 2b and 2c come from the Epanichnikov profile):

Image 14

Using the Epanichnikov kernel for an ROI of 5X5 pixels might look like shown below. We can see that the pixels near the centre have a larger value than the pixels at the edges:

Image 15

Background Weighted Meanshift Tracking

The background is always of concern in visual tracking. When the target and background colours are the same, there is a high probability that the tracker will get lost. Comaniciu Et Al. [3] have proposed a variation of the Meanshift algorithm in which the histogram of the background area around the ROI is also calculated and fused with the histogram of the ROI to create a more reliable model. A histogram is created for an area around the ROI window. An appropriate area can be selected experimentally. For each bin, a weight is then derived by taking the minimum between 1 and the minimum non-zero number in that bin. If Image 16 is the normalised histogram and Image 17 is the smallest non-zero entry, then the weights are derived according to:

Image 18

The weights are to reduce the importance of those features that are prominent in the background, i.e., with low Image 19 - this is given below:

  1. Get the smallest non-zero value from the background histogram
  2. For each bin in the target histogram:
    1. Find Image 20
    2. Apply weights to the target model: Image 21

Using the Code

Step 1: Select a histogram size - the default is 4x4x4.

Step 2: Select a tracker type.

Step 3: Click the "Load Sequence" button and select the folder containing JPG or BMP files.

Step 4: Select an area in the image using mouse - Right click the mouse and drag. This will create a colour histogram of the selected area. The target histogram is shown in red and will not change during the tracking. The red ROI is for the target and the yellow is for the background.

Step 5: Click the "Start Tracking" button to see the tracking in action.

For Meanshift, the candidate histogram is shown in blue and will change during tracking as it is the histogram of the area that the tracker "thinks" is the target. The centroid and Bhattacharyya Coefficient are displayed on the screen. For the centroid, only the centroid information is displayed.

Moving Forward

I have not implemented scale adaptation described in the paper. Scale adaptation means the ROI should shrink and expand when the object moves away from and near to the camera, respectively. The paper suggests evaluating Bhattacharyya coefficient at a scale smaller and larger than the current ROI and then set the ROI size to the one that gives you the maximum coefficient. Also, I would encourage the readers to implement the CAMShift Tracker [4]. The series of articles so far has all the information required to implement the CAMShift tracker plus the paper is easy to follow - Good luck!

Acknowledgements

For any third party code or material used, please refer to "Points of Interest" in Part I.

History

  • Version 1.0

References

  • [1] D. Comaniciu and P. Meer, “Mean Shift Analysis and Applications”, Proc. Seventh International Conference Computer Vision, pp 1197 – 1203, September 1999.
  • [2] D. Comaniciu, V. Ramesh, and P. Meer, “Real-Time Tracking of Non-Rigid Objects Using Mean Shift”, Proc. 2000 IEEE Conference Computer Vision and Pattern Recognition, vol. II, pp. 142 – 149, June 2000.
  • [3] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-Based Object Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 25, No. 5, May 2003.
  • [4] Gary Bradski, “Computer Vision Face Tracking For Use in a Perceptual User Interface”, Proc. IEEE Workshop Applications of Computer Vision, pp. 214 – 219, October 1998.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Australia Australia
I have been in the IT industry since April 1996. My main expertise is in Microsoft space.

Coming from engineering background, any application of programming to engineering and related fields easily excites me. I like to use OO and design patterns and find them very useful.

I have been an avid reader of CodeProject. I decided it was time to make a commitment to make my contribution to the community - so here I am.

My Website: http://www.puresolutions-online.com

Comments and Discussions

 
Questionabout the table Pin
Member 139113655-Oct-19 22:06
Member 139113655-Oct-19 22:06 
QuestionSuggestion Pin
Adenegan Josiah18-Jan-13 4:11
Adenegan Josiah18-Jan-13 4:11 
QuestionEpanichnikov kernel Pin
Alper Yeg11-Nov-12 4:50
Alper Yeg11-Nov-12 4:50 
QuestionCAMSHIFT in text detection Pin
butterflyzx412-Aug-12 1:15
butterflyzx412-Aug-12 1:15 
AnswerRe: CAMSHIFT in text detection Pin
Arif_Khan12-Aug-12 11:18
Arif_Khan12-Aug-12 11:18 
GeneralRe: CAMSHIFT in text detection Pin
butterflyzx412-Aug-12 20:31
butterflyzx412-Aug-12 20:31 
GeneralMy vote of 5 Pin
Alister00727-Apr-12 23:04
Alister00727-Apr-12 23:04 
Questionabout the reference Pin
xchangfang17-Nov-11 15:45
xchangfang17-Nov-11 15:45 
Questionthe questions about the input,if the input is a video Pin
xchangfang2-Nov-11 15:49
xchangfang2-Nov-11 15:49 
AnswerRe: the questions about the input,if the input is a video Pin
Arif_Khan2-Nov-11 16:21
Arif_Khan2-Nov-11 16:21 
GeneralRe: the questions about the input,if the input is a video Pin
xchangfang2-Nov-11 21:40
xchangfang2-Nov-11 21:40 
QuestionCompilation of Mean Shift Code Pin
Member 80692872-Aug-11 4:22
Member 80692872-Aug-11 4:22 
QuestionCompilation of Mean Shift Code Pin
Member 806928727-Jul-11 8:07
Member 806928727-Jul-11 8:07 
AnswerRe: Compilation of Mean Shift Code Pin
Arif_Khan27-Jul-11 17:14
Arif_Khan27-Jul-11 17:14 
GeneralRe: Compilation of Mean Shift Code Pin
Member 806928729-Jul-11 4:16
Member 806928729-Jul-11 4:16 
GeneralRe: Compilation of Mean Shift Code Pin
Arif_Khan2-Aug-11 12:25
Arif_Khan2-Aug-11 12:25 
GeneralRe: Compilation of Mean Shift Code Pin
Member 80692873-Aug-11 5:21
Member 80692873-Aug-11 5:21 
GeneralRe: Compilation of Mean Shift Code Pin
Arif_Khan6-Aug-11 13:35
Arif_Khan6-Aug-11 13:35 
GeneralRe: Compilation of Mean Shift Code Pin
Member 80692877-Aug-11 8:25
Member 80692877-Aug-11 8:25 
GeneralRe: Compilation of Mean Shift Code Pin
Arif_Khan7-Aug-11 23:58
Arif_Khan7-Aug-11 23:58 
GeneralRe: Compilation of Mean Shift Code Pin
Member 80692879-Aug-11 2:30
Member 80692879-Aug-11 2:30 
GeneralRe: Compilation of Mean Shift Code Pin
Arif_Khan9-Aug-11 16:12
Arif_Khan9-Aug-11 16:12 
GeneralRe: Compilation of Mean Shift Code Pin
Member 806928712-Aug-11 15:28
Member 806928712-Aug-11 15:28 
GeneralQuestion about the first table Pin
ollydbg234-Aug-09 5:24
ollydbg234-Aug-09 5:24 
GeneralRe: Question about the first table Pin
Arif_Khan4-Aug-09 13:26
Arif_Khan4-Aug-09 13:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.