|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
IntroductionHash tables are very common data structures. They provide efficient key based operations to insert and search for data in containers. Like many other things in computer science, there are trade offs associated to the use of hash tables. They are not good choices when there is a need for sort and select operations. There are two main issues regarding the implementation of hash based containers: the hash function and the collision resolution mechanism. The hash function is responsible for the arithmetic operation that transforms a particular key into a particular table address. The collision resolution mechanism is responsible for dealing with keys that hash to the same address. Although the C++ STL (Standard Template Library) does not support hash based containers, most compilers like GCC or Visual C++ have their own implementation - it is worth to note that TR1 (Technical Report 1) already offers unordered map and set structures. Also, a good alternative is STLport [1], which provides the following class templates: hash_set; hash_map; hash_multiset; hash_multimap.
Interfaces of hash tables implementations from different compilers may vary. Besides, not all of them support TR1. Integrating STLport into your code might not be that easy (unless you are already using it, of course). Finally, most of these hash based containers use separate chaining as the collision resolution mechanism, which is a straightforward method, but not the only one. Open addressing is a method for handling collisions through sequential probes in the hash table. It can be very useful when there is enough contiguous memory and knowledge of the approximate number of elements in the table is available. Two well known open addressing strategies are linear probing and double hashing, which I briefly discuss in the next section. In this article I present a generic standalone STL-like implementation of a hash table that uses either linear probing or double hashing as the collision resolution mechanism. It serves as the underlying implementation of the four class templates mentioned above and it is constructed with many C++ techniques applied in STLport. The code follows very closely to the STL and extended SGI concepts. Precisely, only one operation is not implemented (the reason is explained later). In addition, the code also works as an interesting introduction to generic programming. This is a brief article and I do not provide neither theoretical discussion about hash tables nor all the implementation details. I am more interested in providing the source code so others can inspect, give sugestions, and hopefully benefit from it. Hash Table ParametrizationThe following class template is the underlying implementation of template <
class key_t,
class value_t,
class hash_fcn_t,
class increment_t,
class equal_key_t,
class get_key_t,
class alloc_t>
class hash_table__
{
//...
};
Most of the template parameters above have meaningful names and a straightforward role. The first one, There is only one significant difference between linear probing and double hashing. With linear probing, when a collision is detected the algorithm looks for the position right next to the current table address to check if it is available. It keeps doing that until it finds an empty spot (once it reaches the end of the table it starts over). Naturally, it is important that the table is not full. (This implementation assumes a maximum load factor of 5/10.) With double hashing, a second hash function is used as the increment from the current table address to the other positions the algorithm look for. In this particular case, there is an extra concern of writing a second hash function that will not lead to an infinite loop. A good usual choice is an increment that is prime to the size of the table. I already provide two template specializations (for integral types) to use as the argument for parameter //For linear probing.
template <class key_t>
struct unit_increment
{
std::size_t operator()(const key_t&){ return 1; }
};
//For double hashing.
template <class key_t>
struct hash_increment{};
//Specialization for integral types are just like this one.
template <>
struct hash_increment<short>
{
std::size_t operator()(short x){return (x % 97) + 1;}
};
The behaviour of linear probing and double hashing are very similar. However, double hashing is less subjected to the formation of clusters, since keys that hash to the same address are likely to be spread throughout the table. On the other hand, implementation of linear probing might be a little bit simpler because of issues related to the erase operation. See [2] for more details. As I mentioned before, template <
class key_t,
class value_t,
class hash_fcn_t = hash<key_t>,
class increment_t = unit_increment<key_t>,
class equal_key_t = std::equal_to<key_t>,
class alloc_t = std::allocator<std::pair<key_t, value_t> > >
class hash_multimap
{
private:
typedef std::pair<key_t, value_t> Map_pair;
typedef hash_table__<
key_t,
Map_pair,
hash_fcn_t,
increment_t,
equal_key_t,
select1st<Map_pair>,
alloc_t> HT;
HT underlying_;
//...
};
IteratorsUnder the hood, the hash table is implemented with When an One of the template parameters of the iterator implementation is the hash table type. The other is the traits type. For the definition of type template <class value_t>
struct const_traits
{
typedef const value_t* pointer;
typedef const value_t& reference;
};
template <class value_t>
struct non_const_traits
{
typedef value_t* pointer;
typedef value_t& reference;
};
//Iterator implementation.
template <
class hash_container_t,
class constness_traits_t>
struct hash_table_iterator__
{
//...
typedef typename constness_traits_t::pointer pointer;
typedef typename constness_traits_t::reference reference;
reference operator*()const {return (*this->container_)[this->current_].value_;}
pointer operator->()const {return &(operator*());}
};
A final issue is that conversion between an Using the codeUsage of the code is just like for any STL container. Here is an example. #include "hash_map.h"
#include <iostream>
int main()
{
using namespace hashcol;
typedef hash_map<int,double> Map;
typedef Map::value_type value_type;
typedef Map::iterator iterator;
Map map;
map.insert(value_type(8, 8.888));
map.insert(value_type(1, 1.111));
map.insert(value_type(12, 12.12));
map.insert(value_type(3, 3.3));
map.insert(value_type(122, 122.122));
std::cout << "\nSize is: " << map.size();
std::cout << "\nElements are:";
for (iterator it = map.begin(); it != map.end(); ++it)
{
std::cout << "\n\tKey = " << it->first << " Value = " << it->second;
}
return 0;
}
NotesThis implementation does not satisfy a particular requirement of an Associative Container. It is inherent to the model of both linear probing and double hashing that equal keys are not necessarily stored adjacent to each other. That is why expression References[1] STLport - http://www.stlport.org/. [2] Sedgewick R. Algorithms in C++ - Fundamentals, Data Structures, Sorting and Searching (3rd edn). Addison-Wesley, 1998.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||