Click here to Skip to main content
12,451,089 members (53,439 online)
Click here to Skip to main content
Add your own
alternative version


56 bookmarked

Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)

, 22 Apr 2014 CPOL
Rate this:
Please Sign up or sign in to vote.
An implementation of Bag-Of-Feature descriptor based on SIFT features using OpenCV and C++ for content based image retrieval applications.


Content based image retrieval (CBIR) is still an active research field. There are a number of approaches available to retrieve visual data from large databases. But almost all the approaches require an image digestion in their initial steps. Image digestion is describing an image using low level features such as color, shape, and texture while removing unimportant details. Color histograms, color moments, dominant color, scalable color, shape contour, shape region, homogeneous texture, texture browsing, and edge histogram are some of the popular descriptors that are used in CBIR applications. Bag-Of-Feature (BoF) is another kind of visual feature descriptor which can be used in CBIR applications. In order to obtain a BoF descriptor we need to extract a feature from the image. This feature can be any thing such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), and LBP (Local Binary Patterns), etc.

You can find a brief description of BoF, SIFT, and how to obtain BoF from SIFT features (BoF-SIFT) with the source code from this article. BoF-SIFT has been implemented using OpenCV 2.4 and Visual C++ (VS2008). But you can easily modify the code to work with any flavor of C++. You can write the same code yourself if you go through a few OpenCV tutorials.

If you are a developer of CBIR applications or a researcher of visual content analysis you may use this code for your application or for comparing with your own visual descriptor. Further you can modify this code to obtain other BoF descriptors such as BoF-SURF or BoF-LBP, etc.


BoF and SIFT are totally independent algorithms. The next sections describe SIFT and then BoF.

SIFT - Scale Invariant Feature Transform

Point like features are very popular in many fields including 3D reconstruction and image registration. A good point feature should be invariant to geometrical transformation and illumination. A point feature can be a blob or a corner. SIFT is one of most popular feature extraction and description algorithms. It extracts blob like feature points and describe them with a scale, illumination, and rotational invariant descriptor.

The above image shows how a SIFT point is described using a histogram of gradient magnitude and direction around the feature point. I'm not going to explain the whole SIFT algorithm in this article. But you can find the theoretical background of SIFT from Wikipedia or read David Lowe's original article regarding SIFT. I recommend to read this blog post for those with less interest in mathematics.

Unlike color histogram descriptor or LBP like descriptors, SIFT algorithm does not give an overall impression of the image. Instead it detects blob like features from the image and describe each and every point with a descriptor that contains 128 numbers. As the output, it gives an array of point descriptors.

CBIR needs a global descriptor in order to match with visual data in a database or retrieve the semantic concept out of a visual content. We can use the array of point descriptors that yields from the SIFT algorithm for obtaining a global descriptor which gives an overall impression of visual data for CBIR applications. There are several methods available to obtain that global descriptor from SIFT feature point descriptors, and BoF is one general method that can be used to do the task.

Bag-Of-Feature (BoF) Descriptor

BoF is one of the popular visual descriptors used for visual data classification. BoF is inspired by a concept called Bag of Words that is used in document classification. A bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words of features is a sparse vector of occurrence counts of a vocabulary of local image features.

BoF typically involves in two main steps. First step is obtaining the set of bags of features. This step is actually offline process. We can obtain set of bags for particular features and then use them for creating BoF descriptor. The second step is we cluster the set of given features into the set of bags that we created in first step and then create the histogram taking the bags as the bins. This histogram can be used to classify the image or video frame.

Bag-of_Features with SIFT

Let's see how can we build BoF with SIFT features.

  • 1. Obtain the set of bags of features.
    1. Select a large set of images.
    2. Extract the SIFT feature points of all the images in the set and obtain the SIFT descriptor for each feature point that is extracted from each image.
    3. Cluster the set of feature descriptors for the amount of bags we defined and train the bags with clustered feature descriptors (we can use the K-Means algorithm).
    4. Obtain the visual vocabulary.
  • 2. Obtain the BoF descriptor for given image/video frame.
    1. Extract SIFT feature points of the given image.
    2. Obtain SIFT descriptor for each feature point.
    3. Match the feature descriptors with the vocabulary we created in the first step
    4. Build the histogram.

The following image shows the above two steps clearly. (The image taken from

Using the code

With OpenCV we can implement BoF-SIFT with just a few lines of code. Make sure that you have installed OpenCV 2.3 or higher version and Visual Studio 2008 or higher. The OpenCV version requirement is a must but still you may use other C++ flavors without any problems.

The code has two separate regions that are compiled and run independently. The first region is for obtaining the set of bags of features and the other region for obtaining the BoF descriptor for a given image/video frame. You need to run the first region of the code only once. After creating the vocabulary you can use it with the second region of code any time. Modifying the code line below can switch between the two regions of code.

#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD to 1 for Step 1. 0 for step 2 

Setting the DICTIONARY_BUILD constant to 1 will activate the following code region.

//Step 1 - Obtain the set of bags of features.

//to store the input file names
char * filename = new char[100];        
//to store the current input image
Mat input;    

//To store the keypoints that will be extracted by SIFT
vector<KeyPoint> keypoints;
//To store the SIFT descriptor of current image
Mat descriptor;
//To store all the descriptors that are extracted from all the images.
Mat featuresUnclustered;
//The SIFT feature extractor and descriptor
SiftDescriptorExtractor detector;    

//I select 20 (1000/50) images from 1000 images to extract
//feature descriptors and build the vocabulary
for(int f=0;f<999;f+=50){        
    //create the file name of an image
    //open the file
    input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); //Load as grayscale                
    //detect feature points
    detector.detect(input, keypoints);
    //compute the descriptors for each keypoint
    detector.compute(input, keypoints,descriptor);        
    //put the all feature descriptors in a single Mat object 
    //print the percentage
    printf("%i percent done\n",f/10);

//Construct BOWKMeansTrainer
//the number of bags
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary=bowTrainer.cluster(featuresUnclustered);    
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;

You can find what each line of code does by going through the comments above the line. As a summary this part of code simply reads a set of images from my hard disk, extracts SIFT feature and descriptors, concatenates them, clusters them to a number of bags (dictionarySize), and then produces a vocabulary by training the bags with the clustered feature descriptors. You can modify the path to the images and give your own set of images to build the vocabulary.

After running this code, you can see a file called dictionary.yml in your project directory. I suggest you open it with Notepad and see how the vocabulary appears. It may not make any sense for you. But you can get an idea about the structure of the file which will be important if you work with OpenCV in future,

If you run this code successfully then you can activate the next section by setting DICTIONARY_BUILD to 0. Here onwards we don't need the first part of the code since we already obtained a vocabulary and saved it in a file.

The following part is the next code section which achieves the second step.

    //Step 2 - Obtain the BoF descriptor for given image/video frame. 

    //prepare BOW descriptor extractor from the dictionary    
    Mat dictionary; 
    FileStorage fs("dictionary.yml", FileStorage::READ);
    fs["vocabulary"] >> dictionary;
    //create a nearest neighbor matcher
    Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
    //create Sift feature point extracter
    Ptr<FeatureDetector> detector(new SiftFeatureDetector());
    //create Sift descriptor extractor
    Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);    
    //create BoF (or BoW) descriptor extractor
    BOWImgDescriptorExtractor bowDE(extractor,matcher);
    //Set the dictionary with the vocabulary we created in the first step
    //To store the image file name
    char * filename = new char[100];
    //To store the image tag name - only for save the descriptor in a file
    char * imageTag = new char[10];
    //open the file to write the resultant descriptor
    FileStorage fs1("descriptor.yml", FileStorage::WRITE);    
    //the image file with the location. change it according to your image file location
    //read the image
    Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);        
    //To store the keypoints that will be extracted by SIFT
    vector<KeyPoint> keypoints;        
    //Detect SIFT keypoints (or feature points)
    //To store the BoW (or BoF) representation of the image
    Mat bowDescriptor;        
    //extract BoW (or BoF) descriptor from given image
    //prepare the yml (some what similar to xml) file
    //write the new BoF descriptor to the file
    fs1 << imageTag << bowDescriptor;        
    //You may use this descriptor for classifying the image.
    //release the file storage

In this section SIFT features and descriptors are calculated for a particular image and we match each and every feature descriptor with the vocabulary we created before.

Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher); 

This line of code will create a matcher that matches the descriptor with a Fast Library for Approximate Nearest Neighbors (FLANN). There are some other types of matchers available so you can explore them yourself. In general an approximate nearest neighbor matching works well.

Finally the code outputs the Bag Of Feature descriptor and saves in a file with the following code line.

fs1 << imageTag << bowDescriptor;

This descriptor can be used to classify the image for several classes. You may use SVM or any other classifier to check the discriminative power and the robustness of this descriptor. On the other hand you can directly match BoF descriptors to different images in order to measure similarity.

Points of Interest

I found that this code can easily be converted into a BoF implementation of any other feature such as BoF-SURF, BoF-ORB, BoF-Opponent-SURF and BoF-Opponent-SIFT, etc.

You can find c++ and openCV source codes of implementations of both BoF-SURF, BoF-ORB in the following link.

Download Bag-of-Features Descriptor on SURF and ORB Features (BoF-SURF and BoF-ORB)

Changing the lines below can get the BoF descriptor with any other type of feature.

SiftDescriptorExtractor detector;
Ptr<FeatureDetector> detector(new SiftFeatureDetector());
Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);

The latest versions of OpenCV include many feature detection and description algorithms so you can apply those algorithms modifying this code and determine the best method for your CBIR application or research.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Ravimal Bandara
Student University of Moratuwa
Sri Lanka Sri Lanka
PhD student specializing in video content analysis. Interested in Image processing, HCI and Digital music production. Computer and technology enthusiast. Love coding and sharing my knowledge.

You may also be interested in...


Comments and Discussions

Questionclassifying a test image to one of the 3 catogories Pin
Member 109206188-Nov-14 9:59
memberMember 109206188-Nov-14 9:59 
AnswerRe: classifying a test image to one of the 3 catogories Pin
Ravimal Bandara9-Nov-14 17:53
memberRavimal Bandara9-Nov-14 17:53 
Questionabout the results in second step Pin
Member 1075383110-Sep-14 2:58
memberMember 1075383110-Sep-14 2:58 
AnswerRe: about the results in second step Pin
Ravimal Bandara10-Sep-14 6:57
memberRavimal Bandara10-Sep-14 6:57 
QuestionDictionary Size and Image filing Pin
RenierBotha20-Aug-14 8:29
memberRenierBotha20-Aug-14 8:29 
QuestionObject classification Pin
Member 1097061910-Aug-14 23:49
memberMember 1097061910-Aug-14 23:49 
QuestionminHessian, octaves, layers & number of bags for BoF-SURF Pin
Mara Rufino9-Jul-14 23:28
memberMara Rufino9-Jul-14 23:28 
AnswerRe: minHessian, octaves, layers & number of bags for BoF-SURF Pin
Ravimal Bandara10-Jul-14 0:38
memberRavimal Bandara10-Jul-14 0:38 
First of all it will be really useful if you can read the original papers of SIFT by Lowe, and SURF.
For your first question, the SURF features are detected by thresholding the determinant of Hessian matrix of unit patches. In simple word, we first calculate the determinant of hessian for each and every patches in the image and then threshold it to find the robust feature points. the minHessian is the controller of this threshold, so if you increase it, you will get less amount of feature points and if you decrease it you will get more feature points. One of the most important property of a feature is its repeatability (the tendency of re-detection the same feature in another image of the same scene but with different angle of camera). If you set the threshold to a lower value then you will get lot of weak feature points which have less repeatability. If you over threshold it then there will not be enough features to describe the image. You also can keep 400 for minHessian as it give enough amount of feature points for natural images. In special cases such as in medical domain you need to fine tune this value by doing an experiment.

For the second question, an octave represents a series of filter response maps obtained by convolving the same input image with a filter of increasing size. Unlike the other algorithm, in SURF we don't need to rescale the image to detect features of different sizes but we can use filters with different sizes. If we say 4 octaves and 2 octave layers then it means,
first we filter the image with the size 9x9 and then 15x15 (this is the two octave layers of the first octave)
second we filter the image with the size 15x15 and then 27x27 (this is the two octave layers of the second octave)
third we filter the image with the size 27x27 and then 51x51 (this is the two octave layers of the third octave)
finally we filter the image with the size 51x51 and then 99x99 (this is the two octave layers of the fourth octave)

You can see in every octave the filter size is increased logarithmic scale.
9 + (6*1) = 15
15 + (6*2) = 27
27 + (6*4) = 51
51 + (6*8) = 99

the value 6 is chosen because it promises that the filter has a center and the size is uneven.

finally it selects features from 2X4 response maps.
Increasing the octave number will give you the ability to detect both smaller and larger sized features in the image. Increasing the number of octave layers give you the ability to detect features in many different sizes between the range of the smallest to the largest. For an example, assume that in your image there is a cat, an elephant, a human and a pig. The following table shows how we detect features with different values for the parameters.
Octaves | Octave Layers | Who is detected
1       |              1| cat
2       |              1| cat, pig
1       |              2| cat, pig
2       |              2| cat, pig, human
3       |              1| cat, pig, human
3       |              2| cat, pig, human, Elephant 

The bad effect is more octaves or octaves number increases the running time of the algorithm.

Number of bags should be determined based on and experiment. There is a publication that 200 of bags performed well. If you are doing a research then you have to find the best number of bags by assessing the retrieval performance with varying the number of bags.

For the third question, it will be easy if you push all the features to a one Mat object because you can directly use the openCV function to cluster them. Otherwise you have to manually cluster and find the cluster centers to count as the vocabulary.
GeneralRe: minHessian, octaves, layers & number of bags for BoF-SURF Pin
Andréa VR15-Sep-14 23:19
memberAndréa VR15-Sep-14 23:19 
GeneralRe: minHessian, octaves, layers & number of bags for BoF-SURF Pin
Ravimal Bandara16-Sep-14 20:15
memberRavimal Bandara16-Sep-14 20:15 
GeneralMy vote of 5 Pin
wangdaxing28-May-14 23:45
memberwangdaxing28-May-14 23:45 
GeneralRe: My vote of 5 Pin
Ravimal Bandara29-May-14 16:06
memberRavimal Bandara29-May-14 16:06 
QuestionCombining different descriptors Pin
Bluestreak200110-May-14 3:17
memberBluestreak200110-May-14 3:17 
AnswerRe: Combining different descriptors Pin
Ravimal Bandara10-May-14 4:41
memberRavimal Bandara10-May-14 4:41 
Questionhow to measure similarity between two videos Pin
Member 1076267720-Apr-14 16:13
memberMember 1076267720-Apr-14 16:13 
AnswerRe: how to measure similarity between two videos Pin
Ravimal Bandara21-Apr-14 19:03
professionalRavimal Bandara21-Apr-14 19:03 
GeneralRe: how to measure similarity between two videos Pin
Member 1095980724-Jul-14 21:17
memberMember 1095980724-Jul-14 21:17 
QuestionBag of Features and SVM Pin
Member 1056480120-Apr-14 15:34
memberMember 1056480120-Apr-14 15:34 
AnswerRe: Bag of Features and SVM Pin
Ravimal Bandara21-Apr-14 8:30
professionalRavimal Bandara21-Apr-14 8:30 
Questiondifferent image types and dictionary Pin
Member 1076193420-Apr-14 4:24
memberMember 1076193420-Apr-14 4:24 
AnswerRe: different image types and dictionary Pin
Ravimal Bandara21-Apr-14 8:20
professionalRavimal Bandara21-Apr-14 8:20 
QuestionWhat move to do the same procedure with videos? Pin
Member 1075075615-Apr-14 3:23
memberMember 1075075615-Apr-14 3:23 
AnswerRe: What move to do the same procedure with videos? Pin
Ravimal Bandara15-Apr-14 5:19
professionalRavimal Bandara15-Apr-14 5:19 
GeneralRe: What move to do the same procedure with videos? Pin
Member 1075075616-Apr-14 12:35
memberMember 1075075616-Apr-14 12:35 
QuestionImage similarity in very large database Pin
Member 107307227-Apr-14 4:46
memberMember 107307227-Apr-14 4:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160826.1 | Last Updated 22 Apr 2014
Article Copyright 2013 by Ravimal Bandara
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid