Click here to Skip to main content
Click here to Skip to main content

A Generic Open-address Hash Table Implementation with an STL-like Interface

By , 6 Jul 2009
 

Introduction

Hash tables are very common data structures. They provide efficient key based operations to insert and search for data in containers. Like many other things in Computer Science, there are trade offs associated to the use of hash tables. They are not good choices when there is a need for sort and select operations.

There are two main issues regarding the implementation of hash based containers: the hash function and the collision resolution mechanism. The hash function is responsible for the arithmetic operation that transforms a particular key into a particular table address. The collision resolution mechanism is responsible for dealing with keys that hash to the same address.

Although the C++ STL (Standard Template Library) does not support hash based containers, most compilers like GCC or Visual C++ have their own implementation - it is worth noting that TR1 (Technical Report 1) already offers unordered map and set structures. Also, a good alternative is STLport [1], which provides the following class templates: 

  • hash_set
  • hash_map
  • hash_multiset
  • hash_multimap

Interfaces of hash table implementations from different compilers may vary. Besides, not all of them support TR1. Integrating STLport into your code might not be that easy (unless you are already using it, of course). Finally, most of these hash based containers use separate chaining as the collision resolution mechanism, which is a straightforward method, but not the only one.

Open addressing is a method for handling collisions through sequential probes in the hash table. It can be very useful when there is enough contiguous memory and knowledge of the approximate number of elements in the table is available. Two well known open addressing strategies are linear probing and double hashing, which I briefly discuss in the next section.

In this article, I present a generic standalone STL-like implementation of a hash table that uses either linear probing or double hashing as the collision resolution mechanism. It serves as the underlying implementation of the four class templates mentioned above, and it is constructed with many C++ techniques applied in STLport. The code follows very closely to the STL and extended SGI concepts. Precisely, only one operation is not implemented (the reason is explained later). In addition, the code also works as an interesting introduction to generic programming. This is a brief article, and I do not provide neither theoretical discussion about hash tables nor all the implementation details. I am more interested in providing the source code so others can inspect, give suggestions, and hopefully benefit from it.

Hash Table Parameterization

The following class template is the underlying implementation of hash_set, hash_map, hash_multiset, and hash_multimap.

template <
  class key_t, 
  class value_t, 
  class hash_fcn_t,
  class increment_t,
  class equal_key_t,
  class get_key_t,
  class alloc_t> 
class hash_table__
{
  //...
};

Most of the template parameters above have meaningful names and a straightforward role. The first one, key_t, is the type of the hash table key. The second, value_t, is the type of the value. (In a set, the value is the actual the key; in a map, the value is a pair containing the key and value.) The hash function is represented by the third parameter. Skipping the fourth parameter, which I will explain in the next paragraph, the fifth parameter, equal_key_t, is a binary predicate used to determine whether two keys are equal. The parameter get_key_t has an interesting responsibility: it is a unary function used to obtain the key of a value stored in the hash table. This parameter must be set in accordance to the type of the value. The last template parameter is the allocator type.

There is only one significant difference between linear probing and double hashing. With linear probing, when a collision is detected, the algorithm looks for the position right next to the current table address to check if it is available. It keeps doing that until it finds an empty spot (once it reaches the end of the table, it starts over). Naturally, it is important that the table is not full. (This implementation assumes a maximum load factor of 5/10.) With double hashing, a second hash function is used as the increment from the current table address to the other positions the algorithm looks for. In this particular case, there is an extra concern of writing a second hash function that will not lead to an infinite loop. A good usual choice is an increment that is prime to the size of the table. I already provide two template specializations (for integral types) to use as the argument for the parameter increment_t.

//For linear probing.
template <class key_t>
struct unit_increment
{
  std::size_t operator()(const key_t&){ return 1; }
};

//For double hashing.
template <class key_t>
struct hash_increment{};

//Specialization for integral types are just like this one.
template <> 
struct hash_increment<short>
{
  std::size_t operator()(short x){return (x % 97) + 1;}
};

The behaviours of linear probing and double hashing are very similar. However, double hashing is less subjected to the formation of clusters, since keys that hash to the same address are likely to be spread throughout the table. On the other hand, implementation of linear probing might be a little bit simpler because of issues related to the erase operation. See [2] for more details.

As I mentioned before, hash_table__ is the underlying implementation of the map and set class templates. This is done through composition as shown in the code below for the case of hash_map.

template <
  class key_t, 
  class value_t, 
  class hash_fcn_t = hash<key_t>, 
  class increment_t = unit_increment<key_t>,
  class equal_key_t = std::equal_to<key_t>, 
  class alloc_t = std::allocator<std::pair<key_t, value_t> > >
class hash_multimap 
{
private:
  typedef std::pair<key_t, value_t> Map_pair;
  typedef hash_table__<
    key_t,
    Map_pair,
    hash_fcn_t,
    increment_t,
    equal_key_t,
    select1st<Map_pair>,
    alloc_t> HT; 
  HT underlying_;  
  //...
};

Iterators

Under the hood, the hash table is implemented with std::vector. Therefore, implementation of an iterator is relatively simple. Basically, it is necessary to have a pointer to the vector, an integral type to indicate the current position (or index), and some intelligence for the iteration process. There is, though, one point that is worth talking about.

When an iterator is dereferenced, a reference to an instance is provided. However, when a const_iterator is dereferenced, a constant reference is provided. Since this is just a mater of type difference, instead of writing two implementations for the iterators, we could use an auxiliary traits class template to do the job.

One of the template parameters of the iterator implementation is the hash table type. The other is the traits type. For the definition of the type iterator, the class template non_const_traits is used. For the definition of the type const_iterator, the class template const_traits is used. The following code shows the idea:

template <class value_t>
struct const_traits
{
  typedef const value_t* pointer;
  typedef const value_t& reference;
};

template <class value_t>
struct non_const_traits
{
  typedef value_t* pointer;
  typedef value_t& reference;
};

//Iterator implementation.
template <
  class hash_container_t,
  class constness_traits_t>  
struct hash_table_iterator__
{  
  //...
  typedef typename constness_traits_t::pointer pointer;
  typedef typename constness_traits_t::reference reference;
  reference operator*()const {return (*this->container_)[this->current_].value_;}
  pointer operator->()const {return &(operator*());}
};

A final issue is that conversion between an iterator and a const_iterator should be handled properly. Please refer to the source code for examples.

Using the Code

The usage of the code is just like for any STL container. Here is an example:

#include "hash_map.h"
#include <iostream>

int main()
{
  using namespace hashcol;
  
  typedef hash_map<int,double> Map;
  typedef Map::value_type value_type;
  typedef Map::iterator iterator;

  Map map;  
  map.insert(value_type(8, 8.888));
  map.insert(value_type(1, 1.111));
  map.insert(value_type(12, 12.12));
  map.insert(value_type(3, 3.3));
  map.insert(value_type(122, 122.122));
  
  std::cout << "\nSize is: " << map.size();
  std::cout << "\nElements are:";
  for (iterator it = map.begin(); it != map.end(); ++it)
  {
    std::cout << "\n\tKey = " << it->first 
              << "   Value = " << it->second;
  }
  
  return 0;
}  

References

  • [1] STLport - http://www.stlport.org/
  • [2] Sedgewick R. Algorithms in C++ - Fundamentals, Data Structures, Sorting and Searching (3rd edn). Addison-Wesley, 1998

License

This article, along with any associated source code and files, is licensed under The BSD License

About the Author

Leandro T C Melo
Software Developer
Germany Germany
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralVector boundary overflow bug [modified]memberwei-sun-ding24 Mar '11 - 2:22 
Hi Leandro
 
Thanks for the STL implementation of the open address hashing. I've fixed a bug, would you please verify it?
 
Original code in hash_table.h
 
 
size_type find_position(const key_type& k)
  {
    size_type hx = this->hash_(k) % this->TABLE_SIZE_;
    Element* current = &this->container_[hx];
    while (!current->is_null())
    {
      if (current->is_available() &&
          this->key_equals_(k, this->get_key_(current->value_)))
      {
        return hx;
      }
      hx = hx + this->increment_(k);
      current = &this->container_[hx]; // Possible overflow here if "hx >= container_.size()" now
    }
    return this->container_.size();
  }
 

 
Fix
 
size_type find_position(const key_type& k)
  {
    size_type hx = this->hash_(k) % this->TABLE_SIZE_;
    Element* current = &this->container_[hx];
    while (!current->is_null())
    {
      if (current->is_available() &&
          this->key_equals_(k, this->get_key_(current->value_)))
      {
        return hx;
      }
      hx = (hx + this->increment_(k)) % TABLE_SIZE_;
      current = &this->container_[hx]; 
    }
    return this->container_.size();
modified on Friday, March 25, 2011 4:10 AM

GeneralCompiler errorsmemberQwertie21 Jul '09 - 8:55 
I tried using hashcol::hash_map as a drop-in replacement for stdext::hash_map on Windows CE, to see if performance would improve, but I ran into the following problems:
 
Problem #1:
hash_table.h(300): hash_ was not recognized as a function taking 1 argument
    size_type hx = this->hash_(xkey) % this->TABLE_SIZE_;
stdext::hash_map allows any class that has an operator size_t() to be used as a key (actually, it's more generous than that: it uses an explicit, rather than implicit, cast to size_t). To support this I changed the definition of hash from
template <class key_t> 
struct hash{};
to
template <class key_t> 
struct hash
{
  std::size_t operator()(const key_t& x){return (size_t)x;}
};
Problem #2:
hash_table.h(378): const violation in find().
As a workaround, I changed this line to
    return const_iterator(&this->container_, const_cast<Self*>(this)->find_position(k));
At this point the code compiled. Unfortunately I found the performance of your implementation was distinctly worse than both std::map and stdext::hash_map in my company's mobile map application. There is also some kind of behavioral difference that prevented my usual benchmark program from working, but I didn't investigate further. Thank you for enriching the world of open source, though.
GeneralRe: Compiler errorsmemberLeandro T C Melo22 Jul '09 - 8:29 
Qwertie,
 
just a question. Did you initialize the table with the approximate number of elements? The constructor provides a default argument of size 100. However, if this is not sufficient in your case, you should supply a larger value in order to avoid vector reallocations. This might have a significant impact on performance.
 
I'll check the errors when I have time (they don't happen in my environment). Notice that stdext::hash_map is not standard. By the way, the operator() should be a const member function. I'll fix that in my code too.
 
Thanks for the comments.
GeneralRe: Compiler errorsmemberQwertie24 Jul '09 - 11:21 
Oh, no I didn't try that. However it should still be a fair comparison, as I didn't reserve space beforehand in the hash_map either. I would have liked to try the more standard std::tr1::unordered_map, but it's not available in WinCE.
GeneralRe: Compiler errorsmemberLeandro T C Melo24 Jul '09 - 16:56 
Qwertie wrote:
Oh, no I didn't try that. However it should still be a fair comparison, as I didn't reserve space beforehand in the hash_map either

 
Not really. This is an implementation detail. Also, the hash_map you're using is probably closed-addressing using separate chains. Most of the time you'll get a single allocation for a new element. With my closed-addressing hash table you can end up with many (really many) unnecessary reallocations if you don't pass a "good" argument to the constructor.
 
Anyway, if you happen to benchmark the code again in the future, please let me know.
GeneralRe: Compiler errorsmemberQwertie19 Oct '11 - 10:11 
I ended up writing my own hashtable eventually, which I included in the source code of my benchmarking article[^], in case you're curious.
GeneralProblems with MFCmembermalfaro13 Mar '09 - 9:11 
I tried to compile the code in a MFC application and there were a lot of errors
Does it work for MFC applicationns???
 
Mauricio Alfaro

GeneralRe: Problems with MFCmemberLeandro T C Melo13 Mar '09 - 10:55 
malfaro wrote:
Does it work for MFC applicationns???

 
The code is pure C++ and therefore should work in any compiler (standard compliant) regardless of other libraries.
 
Depending on the headers you're including it's possible that conflicts are happening due to the names hash_map, hash_set, etc, which might also be available in the STL implementation of visual studio (however, in recent version they're under an extension namespace). What are the errors and the version of visual studio you are using?
 
In any case and if you're using
 
using namespace hashcol;
typedef hash_map<int,double> Map;

 
try removing it. Instead, use the qualifier on a per name basis.
 
typedef hashcol::hash_map<int,double> Map;

GeneralRe: Problems with MFCmembermalfaro13 Mar '09 - 13:28 
I'm using Microsoft Visual Studio V.6.0
I just copy the files create a new class y test your sample...
It appears that your code doesn't recognize std::size_t and other stl methods...
and not shure if the MFC recognizes ANSI templates...
The project was compiled in MFC App not Console App...
 
The errors look like this...
 
d:\project aipi vs6\aipi\stlhash\hash_function.h(70) : error C2039: 'size_t' : is not a member of 'std'
d:\project aipi vs6\aipi\stlhash\hash_table.h(142) : error C2995: '==' : template function has already been defined
d:\project aipi vs6\aipi\stlhash\hash_table.h(130) : see declaration of '=='
d:\project aipi vs6\aipi\stlhash\hash_table.h(158) : error C2995: '!=' : template function has already been defined
d:\project aipi vs6\aipi\stlhash\hash_table.h(146) : see declaration of '!='
d:\project aipi vs6\aipi\stlhash\hash_table.h(195) : error C2903: 'rebind' : symbol is neither a class template nor a function template
d:\project aipi vs6\aipi\stlhash\hash_table.h(423) : see reference to class template instantiation 'hashcol::hash_table__<key_t,value_t,hash_fcn_t,increment_t,equal_key_t,get_key_t,alloc_t>' being compiled
d:\project aipi vs6\aipi\stlhash\hash_table.h(195) : error C2059: syntax error : '<'
d:\project aipi vs6\aipi\stlhash\hash_table.h(423) : see reference to class template instantiation 'hashcol::hash_table__<key_t,value_t,hash_fcn_t,increment_t,equal_key_t,get_key_t,alloc_t>' being compiled
d:\project aipi vs6\aipi\stlhash\hash_table.h(423) : see reference to class template instantiation 'hashcol::hash_table__<key_t,value_t,hash_fcn_t,increment_t,equal_key_t,get_key_t,alloc_t>' being compiled
d:\project aipi vs6\aipi\stlhash\hash_table.h(210) : error C2146: syntax error : missing ';' before identifier 'pointer'
d:\project aipi vs6\aipi\stlhash\hash_table.h(423) : see reference to class template instantiation 'hashcol::hash_table__<key_t,value_t,hash_fcn_t,increment_t,equal_key_t,get_key_t,alloc_t>' being compiled
d:\project aipi vs6\aipi\stlhash\hash_table.h(210) : error C2868: 'pointer' : illegal syntax for using-declaration; expected qualified-name
d:\project aipi vs6\aipi\stlhash\hash_table.h(423) : see reference to class template instantiation 'hashcol::hash_table__<key_t,value_t,hash_fcn_t,increment_t,equal_key_t,get_key_t,alloc_t>' being compiled
d:\project aipi vs6\aipi\stlhash\hash_table.h(433) : error C2059: syntax error : ''template<''
d:\project aipi vs6\aipi\stlhash\hash_table.h(436) : error C2065: 'key_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(437) : error C2065: 'value_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(438) : error C2065: 'hash_fcn_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(439) : error C2065: 'increment_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(440) : error C2065: 'equal_key_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(441) : error C2065: 'get_key_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(442) : error C2065: 'alloc_t' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'K' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'V' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'H' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'I' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'E' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'G' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(458) : error C2065: 'A' : undeclared identifier
d:\project aipi vs6\aipi\stlhash\hash_table.h(460) : error C2143: syntax error : missing ';' before '{'
d:\project aipi vs6\aipi\stlhash\hash_table.h(460) : error C2447: missing function header (old-style formal list?)
.....etc.......
.....etc.......
 
P.S. Thanks for the quick response...
 
Mauricio Alfaro

GeneralRe: Problems with MFCmemberLeandro T C Melo14 Mar '09 - 3:21 
Mauricio,
 
I didn't test it with VC 6. This is an old version of the MS compiler/IDE. It has some problems with templates. Even if you consider VC 7 (2002) it still doesn't support member function templates.
 
If I had VC 6 around I could test and try to figure out possible changes that would make the code work. But I don't. Sorry...
 
If you can, I'd suggest you to upgrade to a newer version of the compiler. For example, VC++ 9 (2008). They're really nice and provide much better conformance with the C++ standard. There are express editions distributed for free.
 
Thanks for the interest.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 6 Jul 2009
Article Copyright 2008 by Leandro T C Melo
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid