Click here to Skip to main content
Click here to Skip to main content
Go to top

Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)

, 22 Apr 2014
Rate this:
Please Sign up or sign in to vote.
An implementation of Bag-Of-Feature descriptor based on SIFT features using OpenCV and C++ for content based image retrieval applications.

Introduction

Content based image retrieval (CBIR) is still an active research field. There are a number of approaches available to retrieve visual data from large databases. But almost all the approaches require an image digestion in their initial steps. Image digestion is describing an image using low level features such as color, shape, and texture while removing unimportant details. Color histograms, color moments, dominant color, scalable color, shape contour, shape region, homogeneous texture, texture browsing, and edge histogram are some of the popular descriptors that are used in CBIR applications. Bag-Of-Feature (BoF) is another kind of visual feature descriptor which can be used in CBIR applications. In order to obtain a BoF descriptor we need to extract a feature from the image. This feature can be any thing such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), and LBP (Local Binary Patterns), etc.

You can find a brief description of BoF, SIFT, and how to obtain BoF from SIFT features (BoF-SIFT) with the source code from this article. BoF-SIFT has been implemented using OpenCV 2.4 and Visual C++ (VS2008). But you can easily modify the code to work with any flavor of C++. You can write the same code yourself if you go through a few OpenCV tutorials.

If you are a developer of CBIR applications or a researcher of visual content analysis you may use this code for your application or for comparing with your own visual descriptor. Further you can modify this code to obtain other BoF descriptors such as BoF-SURF or BoF-LBP, etc.

Background

BoF and SIFT are totally independent algorithms. The next sections describe SIFT and then BoF.

SIFT - Scale Invariant Feature Transform

Point like features are very popular in many fields including 3D reconstruction and image registration. A good point feature should be invariant to geometrical transformation and illumination. A point feature can be a blob or a corner. SIFT is one of most popular feature extraction and description algorithms. It extracts blob like feature points and describe them with a scale, illumination, and rotational invariant descriptor.

The above image shows how a SIFT point is described using a histogram of gradient magnitude and direction around the feature point. I'm not going to explain the whole SIFT algorithm in this article. But you can find the theoretical background of SIFT from Wikipedia or read David Lowe's original article regarding SIFT. I recommend to read this blog post for those with less interest in mathematics.

Unlike color histogram descriptor or LBP like descriptors, SIFT algorithm does not give an overall impression of the image. Instead it detects blob like features from the image and describe each and every point with a descriptor that contains 128 numbers. As the output, it gives an array of point descriptors.

CBIR needs a global descriptor in order to match with visual data in a database or retrieve the semantic concept out of a visual content. We can use the array of point descriptors that yields from the SIFT algorithm for obtaining a global descriptor which gives an overall impression of visual data for CBIR applications. There are several methods available to obtain that global descriptor from SIFT feature point descriptors, and BoF is one general method that can be used to do the task.

Bag-Of-Feature (BoF) Descriptor

BoF is one of the popular visual descriptors used for visual data classification. BoF is inspired by a concept called Bag of Words that is used in document classification. A bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words of features is a sparse vector of occurrence counts of a vocabulary of local image features.

BoF typically involves in two main steps. First step is obtaining the set of bags of features. This step is actually offline process. We can obtain set of bags for particular features and then use them for creating BoF descriptor. The second step is we cluster the set of given features into the set of bags that we created in first step and then create the histogram taking the bags as the bins. This histogram can be used to classify the image or video frame.

Bag-of_Features with SIFT

Let's see how can we build BoF with SIFT features.

  • 1. Obtain the set of bags of features.
    1. Select a large set of images.
    2. Extract the SIFT feature points of all the images in the set and obtain the SIFT descriptor for each feature point that is extracted from each image.
    3. Cluster the set of feature descriptors for the amount of bags we defined and train the bags with clustered feature descriptors (we can use the K-Means algorithm).
    4. Obtain the visual vocabulary.
  • 2. Obtain the BoF descriptor for given image/video frame.
    1. Extract SIFT feature points of the given image.
    2. Obtain SIFT descriptor for each feature point.
    3. Match the feature descriptors with the vocabulary we created in the first step
    4. Build the histogram.

The following image shows the above two steps clearly. (The image taken from http://www.sccs.swarthmore.edu/users/09/btomasi1/tagging-products.html)

Using the code

With OpenCV we can implement BoF-SIFT with just a few lines of code. Make sure that you have installed OpenCV 2.3 or higher version and Visual Studio 2008 or higher. The OpenCV version requirement is a must but still you may use other C++ flavors without any problems.

The code has two separate regions that are compiled and run independently. The first region is for obtaining the set of bags of features and the other region for obtaining the BoF descriptor for a given image/video frame. You need to run the first region of the code only once. After creating the vocabulary you can use it with the second region of code any time. Modifying the code line below can switch between the two regions of code.

#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD to 1 for Step 1. 0 for step 2 

Setting the DICTIONARY_BUILD constant to 1 will activate the following code region.

#if DICTIONARY_BUILD == 1
 
//Step 1 - Obtain the set of bags of features.

//to store the input file names
char * filename = new char[100];        
//to store the current input image
Mat input;    

//To store the keypoints that will be extracted by SIFT
vector<KeyPoint> keypoints;
//To store the SIFT descriptor of current image
Mat descriptor;
//To store all the descriptors that are extracted from all the images.
Mat featuresUnclustered;
//The SIFT feature extractor and descriptor
SiftDescriptorExtractor detector;    

//I select 20 (1000/50) images from 1000 images to extract
//feature descriptors and build the vocabulary
for(int f=0;f<999;f+=50){        
    //create the file name of an image
    sprintf(filename,"G:\\testimages\\image\\%i.jpg",f);
    //open the file
    input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); //Load as grayscale                
    //detect feature points
    detector.detect(input, keypoints);
    //compute the descriptors for each keypoint
    detector.compute(input, keypoints,descriptor);        
    //put the all feature descriptors in a single Mat object 
    featuresUnclustered.push_back(descriptor);        
    //print the percentage
    printf("%i percent done\n",f/10);
}    


//Construct BOWKMeansTrainer
//the number of bags
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary=bowTrainer.cluster(featuresUnclustered);    
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;
fs.release();

You can find what each line of code does by going through the comments above the line. As a summary this part of code simply reads a set of images from my hard disk, extracts SIFT feature and descriptors, concatenates them, clusters them to a number of bags (dictionarySize), and then produces a vocabulary by training the bags with the clustered feature descriptors. You can modify the path to the images and give your own set of images to build the vocabulary.

After running this code, you can see a file called dictionary.yml in your project directory. I suggest you open it with Notepad and see how the vocabulary appears. It may not make any sense for you. But you can get an idea about the structure of the file which will be important if you work with OpenCV in future,

If you run this code successfully then you can activate the next section by setting DICTIONARY_BUILD to 0. Here onwards we don't need the first part of the code since we already obtained a vocabulary and saved it in a file.

The following part is the next code section which achieves the second step.

#else
    //Step 2 - Obtain the BoF descriptor for given image/video frame. 

    //prepare BOW descriptor extractor from the dictionary    
    Mat dictionary; 
    FileStorage fs("dictionary.yml", FileStorage::READ);
    fs["vocabulary"] >> dictionary;
    fs.release();    
    
    //create a nearest neighbor matcher
    Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
    //create Sift feature point extracter
    Ptr<FeatureDetector> detector(new SiftFeatureDetector());
    //create Sift descriptor extractor
    Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);    
    //create BoF (or BoW) descriptor extractor
    BOWImgDescriptorExtractor bowDE(extractor,matcher);
    //Set the dictionary with the vocabulary we created in the first step
    bowDE.setVocabulary(dictionary);
 
    //To store the image file name
    char * filename = new char[100];
    //To store the image tag name - only for save the descriptor in a file
    char * imageTag = new char[10];
 
    //open the file to write the resultant descriptor
    FileStorage fs1("descriptor.yml", FileStorage::WRITE);    
    
    //the image file with the location. change it according to your image file location
    sprintf(filename,"G:\\testimages\\image\\1.jpg");        
    //read the image
    Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);        
    //To store the keypoints that will be extracted by SIFT
    vector<KeyPoint> keypoints;        
    //Detect SIFT keypoints (or feature points)
    detector->detect(img,keypoints);
    //To store the BoW (or BoF) representation of the image
    Mat bowDescriptor;        
    //extract BoW (or BoF) descriptor from given image
    bowDE.compute(img,keypoints,bowDescriptor);
 
    //prepare the yml (some what similar to xml) file
    sprintf(imageTag,"img1");            
    //write the new BoF descriptor to the file
    fs1 << imageTag << bowDescriptor;        
 
    //You may use this descriptor for classifying the image.
            
    //release the file storage
    fs1.release();
#endif   

In this section SIFT features and descriptors are calculated for a particular image and we match each and every feature descriptor with the vocabulary we created before.

Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher); 

This line of code will create a matcher that matches the descriptor with a Fast Library for Approximate Nearest Neighbors (FLANN). There are some other types of matchers available so you can explore them yourself. In general an approximate nearest neighbor matching works well.

Finally the code outputs the Bag Of Feature descriptor and saves in a file with the following code line.

fs1 << imageTag << bowDescriptor;

This descriptor can be used to classify the image for several classes. You may use SVM or any other classifier to check the discriminative power and the robustness of this descriptor. On the other hand you can directly match BoF descriptors to different images in order to measure similarity.

Points of Interest

I found that this code can easily be converted into a BoF implementation of any other feature such as BoF-SURF, BoF-ORB, BoF-Opponent-SURF and BoF-Opponent-SIFT, etc.

You can find c++ and openCV source codes of implementations of both BoF-SURF, BoF-ORB in the following link.

Download Bag-of-Features Descriptor on SURF and ORB Features (BoF-SURF and BoF-ORB)

Changing the lines below can get the BoF descriptor with any other type of feature.

SiftDescriptorExtractor detector;
Ptr<FeatureDetector> detector(new SiftFeatureDetector());
Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);

The latest versions of OpenCV include many feature detection and description algorithms so you can apply those algorithms modifying this code and determine the best method for your CBIR application or research.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ravimal Bandara
Student University of Moratuwa
Sri Lanka Sri Lanka
Postgraduate research student specializing in video content analysis. Interested in Image processing, HCI and Digital music production. Computer and technology enthusiast. I love coding and sharing my knowledge.
Follow on   Twitter

Comments and Discussions

 
Questionabout the results in second step PinmemberMember 1075383110-Sep-14 2:58 
AnswerRe: about the results in second step PinmemberRavimal Bandara10-Sep-14 6:57 
QuestionDictionary Size and Image filing PinmemberRenierBotha20-Aug-14 8:29 
QuestionObject classification PinmemberMember 1097061910-Aug-14 23:49 
QuestionminHessian, octaves, layers & number of bags for BoF-SURF PinmemberMara Rufino9-Jul-14 23:28 
AnswerRe: minHessian, octaves, layers & number of bags for BoF-SURF PinmemberRavimal Bandara10-Jul-14 0:38 
GeneralRe: minHessian, octaves, layers & number of bags for BoF-SURF PinmemberAndréa VR15-Sep-14 23:19 
GeneralRe: minHessian, octaves, layers & number of bags for BoF-SURF PinmemberRavimal Bandara16-Sep-14 20:15 
GeneralMy vote of 5 Pinmemberwangdaxing28-May-14 23:45 
GeneralRe: My vote of 5 PinmemberRavimal Bandara29-May-14 16:06 
QuestionCombining different descriptors PinmemberBluestreak200110-May-14 3:17 
AnswerRe: Combining different descriptors PinmemberRavimal Bandara10-May-14 4:41 
Questionhow to measure similarity between two videos PinmemberMember 1076267720-Apr-14 16:13 
AnswerRe: how to measure similarity between two videos PinprofessionalRavimal Bandara21-Apr-14 19:03 
The simplest way to find the TF is just calculate the number of occurrence of a visual word per image. If you are using this code you don't have to count the occurrences because we can call the raw (before normalizing) BoF-SIFT histogram as the TF vector. But you have to calculate IDF for each visual word manually with the following steps.
1. For each visual word, count the number of images in which the frequency of the visual word is not zero. After this step you will have an array which's index is the visual word number and the value is number of images where the visual word is presence.
2. Divide the total number of images by each element of the array and store back in the same location in the array.
idf[i] = number_of_images_in_dataset/(1+idf[i]);
3. For each element in the array calculate the log value and store in the same location in the array.
idf[i] = log(idf[i]);
 
Now the array contains IDF of all visual words, so the array itself the IDF vector. Now you can get TF*IDF by multiplying the each element in TF of an image with related element in the IDF of the image set.
 
TF(image1) = [number_of_visual_word1_in_image1, number_of_visual_word2_in_image1, number_of_visual_word3_in_image1...]
TF(image2) = ...
...
 
IDF(image_set) = [IDF_of_visual_word1, IDF_of_visual_word2, IDF_of_visual_word3...]

GeneralRe: how to measure similarity between two videos PinmemberMember 1095980724-Jul-14 21:17 
QuestionBag of Features and SVM PinmemberMember 1056480120-Apr-14 15:34 
AnswerRe: Bag of Features and SVM PinprofessionalRavimal Bandara21-Apr-14 8:30 
Questiondifferent image types and dictionary PinmemberMember 1076193420-Apr-14 4:24 
AnswerRe: different image types and dictionary PinprofessionalRavimal Bandara21-Apr-14 8:20 
QuestionWhat move to do the same procedure with videos? PinmemberMember 1075075615-Apr-14 3:23 
AnswerRe: What move to do the same procedure with videos? PinprofessionalRavimal Bandara15-Apr-14 5:19 
GeneralRe: What move to do the same procedure with videos? PinmemberMember 1075075616-Apr-14 12:35 
QuestionImage similarity in very large database PinmemberMember 107307227-Apr-14 4:46 
AnswerRe: Image similarity in very large database PinprofessionalRavimal Bandara7-Apr-14 23:51 
GeneralRe: Image similarity in very large database PinmemberMember 107307228-Apr-14 0:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140916.1 | Last Updated 22 Apr 2014
Article Copyright 2013 by Ravimal Bandara
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid