Click here to Skip to main content
Licence CPOL
First Posted 8 Sep 2010
Views 15,555
Downloads 419
Bookmarked 26 times

Hash-container and Red-Black Tree Face-off (STL Benchmark)

By | 29 Dec 2010 | Article
Benchmark of hash and non-hash container

Introduction

This article presents a benchmark application which pits the red-black binary tree containers(map, set, etc.) against hash containers. This article is not a tutorial on the data structures. For more information on how a red-black binary tree works, you can refer to this Wikipedia link. For hash containers, you can read this excellent Codeguru article by Marius Bancila.

BenchmarkApp.png

Red-black binary tree is a very fast search tree data structure: it can find any item in 4 billion items in 32 steps or less. Its O notation is O(log2n) while hash container's average case is O(1) and its worst case is O(Total number of elements). The only time the hash container's time-taken-to-search is not O(1) is due to the searched item sharing the same hash with other different item(s) and the hash container has to compare them one-by-one to find an exact match. For more information on the performance comparison, look at this Boost library link.

Benchmark

For the map and multimap benchmark, the key in key-value pair is a word(string) in English dictionary while the value in key-value pair is a random number. This is okay because the value is not used in the search. For the set and multiset benchmark, the element is a word(string) in English dictionary. Set is a data structure similar to map without the value pair. The multimap and multiset, unlike map and set, allow the same key/element to be insert more than once and doesn't enforce the uniqueness. The reason I use std::wstring for the key/element is because hash containers have a predefined hash function written for std::wstring. If I use my own custom data types, I may have to write my own custom hash function.

For map, multimap, set and multiset insertion, I called the insert method. For multiset and multimap insertion, I insert the same item twice. For map and set search, I called the find method. For multiset and multimap search, I used equal_range method.

For the benchmark, we will benchmark STL, legacy hash container(more on this later), Visual C++ 10 unordered containers and Boost 1.44 unordered containers. Legacy hash containers which I have just mentioned exist in the stdext namespace since Visual Studio 2003. We will benchmark them as well to see how they fare against the C++0x unordered containers. By the way, in case you are not aware, C++0x and Boost hash containers are called unordered_xxx(eg, unordered_map). The benchmark application will populate the containers before running the benchmark if it hasn't been populated but population time is not included in the benchmark results. The reason I did not benchmark the container population is that I found the population to be quite fast.

The benchmark used a random number to index into the std::vector for an item(string) to search in the above containers. You can change the random number generation seed in the benchmark application.

Map Benchmark

The benchmark is done over 5 million searches. The lower the score, the better it is.

MapBenchmark.png

We can see that the hash containers performs 2 times better than STL map. The timing for the hash containers are roughly the same.

MultiMap Benchmark

The benchmark is also done over 5 million searches. The lower the score, the better it is.

MultiMapBenchmark.png

We can see Boost unordered_multimap timing is 4 times better than STL multimap.

Set Benchmark

SetBenchmark.png

We can see that the set benchmark results are similar to the earlier map results.

MultiSet Benchmark

MultiSetBenchmark.png

We can see that multiset benchmark results are similar to the earlier multimap results. As we can see, the hash containers consistently outperform their red-black tree equivalents, we may be tempted to use the hash containers in our code. For most cases, the hash containers are a drop-in replacement for their red-black tree equivalent but some methods may be missing and some method behaviour may change. For example, when you get their iterators, they are not ordered(unordered) unlike the red-black tree versions. You have to investigate their difference yourself.

Building Benchmark Application

Please build the application in release mode. The debug build application runs too slowly. The readers have to download Boost 1.44 and point your Visual C++ 10 include path to Boost 1.44 if you haven't had Boost library on your development machine.

Benchmark Framework

The application is a benchmark framework; Each benchmark test suite is a self-contained Win32 DLL. The benchmark application will load all DLLs with certain function signatures on startup. For some reasons, you want to add your own benchmark DLL (for example, you want to benchmark SGI hash container) to the application folder, the benchmark application will pick it up as well. There is virtually no limit on the number of DLLs you can add. Of course, the more you add, the benchmark time gets longer, and the graph's bars get thinner. To write your own benchmark DLL, just create a new Win32 DLL project and copy source code from any of the existing DLLs. You could just change the included headers to the new container headers which you want to benchmark and you may have to change the container class names as well. There you go, you have a new benchmark DLL. The sequence in which the DLLs are benchmarked is according to GetIndex().

  • STL DLL GetIndex() returns 5.
  • Legacy Hash DLL GetIndex() returns 10.
  • VC10 unordered DLL GetIndex() returns 15.
  • Boost unordered DLL GetIndex() returns 20.

So to insert your own DLL in front of other DLLs, your GetIndex() should return between 1 and 4.

Conclusion

I welcome any feedback(good or bad) as to how I am doing this benchmark correct or wrong and how I could improve it. If my benchmark is not a good representation of how the data search is used in the real-world scenario, please let me know as well. I wrote this article not to teach but to learn from my readers who are far better programmers than I am. Thank you for reading!

History

  • 8th September, 2010: Initial post
  • 28th December, 2010: Source code and binary (attached) are updated to fix an integer (calculation) overflow bug for progressbar if the user specifies > 50+ million searches for the benchmark

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Wong Shao Voon

Software Developer

Singapore Singapore

Member

I guess I'll write here what I does in my free time, than to write an accolade of skills which I currently possess. I believe the things I does in my free time, say more about me.
 
When I am not working, I like to watch Japanese anime. I am also writing some movie script, hoping to see my own movie on the big screen one day.
 
I like to jog because it makes me feel good, having done something meaningful in the morning before the day starts.
 
I also writes articles for Codeproject; I have a few ideas to write about but never get around writing because of hectic schedule.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralMy vote of 3 PinmemberPierfabio3:38 4 Jan '11  
GeneralRe: My vote of 3 PinmemberWong Shao Voon21:05 6 Jan '11  
GeneralATL CRBMap and CString is much faster ~ PinmemberMember 424910114:59 3 Jan '11  
GeneralRe: ATL CRBMap and CString is much faster ~ PinmemberWong Shao Voon21:02 6 Jan '11  
GeneralAwsome! Pinmembered welch13:53 11 Sep '10  
Questioncomparison with the QHash class from Qt? Pinmemberfvandenb4:12 9 Sep '10  
AnswerRe: comparison with the QHash class from Qt? PinmemberPedroMC3:15 4 Jan '11  
GeneralThat's only half of the story PinPopularmemberpeterchen3:57 9 Sep '10  
GeneralRe: That's only half of the story PinmemberGlen Walker2:17 15 Sep '10  
GeneralRe: That's only half of the story PinmemberWong Shao Voon21:42 19 Sep '10  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web01 | 2.5.120517.1 | Last Updated 29 Dec 2010
Article Copyright 2010 by Wong Shao Voon
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid