![]() |
Languages »
C / C++ Language »
Templates
Intermediate
License: The Code Project Open License (CPOL)
AHash: A Set of Simple C++ Hash TemplatesBy AngusEmA set of lightweight hash table templates that are easy to understand and implement |
C++, Dev
|
|
Advanced Search |
|
|
|
||||||||||||||||
I was working on a project where I had to process about 1,000 records. Eventually this set of records grew to several thousand. In some cases, the time complexity of routines that processed this data was O(n2). Using an array to randomly access these records was no longer feasible. It was time to go back to my computer science course notes to work with a more exciting data structure. You are probably here because you are in the same situation. (I'm no longer in that situation. Now I have to process several hundred thousand records!)
In my search for work already done that I could fall back on, the only serious C++ hash template class I could find was in SGI's extension to STL. Like much of STL, I found it difficult to understand. Unlike STL, I found it difficult to use as well. If you have a lightweight task to perform, you have to go through a relatively cumbersome process of defining this, implementing that, etc. I wanted something that could be quickly defined and instantiated -- I would have to write my own.
There are some things that other hash tables can do that this one cannot:
hash_multiset is free from this restriction. If you try adding an element which has a key that already exists in the table, then only the first key will ever be referenced in a random access lookup. This article assumes that you know how a hash table works. If you don't, then you should familiarize yourself and consider the possibility that a hash table is not even an appropriate solution. Since you must also supply the hash function, you must know what those are all about. You don't have to write your own. There are plenty available on the Internet, but you'll need to incorporate them, understand their principles, and judge if they are appropriate to your data.
I've written these templates to be friendly to all compilers, environments and flavours of C++. However, I have only actually tried the code in GNU C++ and Visual C++. The only environmental requirement is that they must have access to the STL vector template class. There is also an include <assert.h>, which is part of the ANSI standard. If, for whatever reason, that causes a compiler error, you can easily comment out the line and define your own assert macro to keep the compiler happy.
There are 3 hash table templates that you can use. The Doxygen-generated documentation is included with the source code. However, let's get you started on a "Hello world," which you can use as an outline to hash table applications.
First, you'll already have your class defined, to which you will need fast random access, called T in the template code (or the "data class"). The declaration to such a class might look like this:
class Student {
public:
Student();
string m_sName;//this will be the unique key to the hash table entry
int m_nGrade;
int m_nID;
};
Next, we will declare your hash class:
#include <ahashp.h>
class StudentHash : public _HashP<Student> {
public:
StudentHash();
private:
virtual bool COMPAREREFERENCESPREATTRIBUTES CompareReferences(
const Student &ref1, const char *ref2) const COMPAREREFERENCESPOSTATTRIBUTES;
virtual bucket_array_unit HASHREFERENCEPREATTRIBUTES GetHashReference(
const char *ref) const HASHREFERENCEPOSTATTRIBUTES;
virtual bucket_array_unit HASHREFERENCEPREATTRIBUTES GetHashReference(
const Student &ref) const HASHREFERENCEPOSTATTRIBUTES;
};
Note that you can dispense with those *ATTRIBUTES macros if you expect never to use them. However, if you do use them, there's a chance that you might forget that you'll need to update your old code and forget about this class. So, if you are a stickler for correctness, make sure you leave them in.
Also note that there are actually two template parameters being passed to _HashP. The second one defaults to const char *. You'll also notice that those template parameters are the types of the parameters being passed to the virtual methods. This is not a coincidence. The template parameters decide the types of the parameters passed to the virtuals. Finally, define your class.
CompareReferences() is used so that when _HashP needs to match a key with a particular element in the hash table, it can. The STL way of doing things might require that Student have a comparison operator. This way, instead of making changes to your data class, you can just implement a pure virtual in your hash class.
bool StudentHash::CompareReferences(const Student &ref1, const char *ref2) const {
return ref1.m_sName == ref2;
}
Now for the hash function. These return an index to the bucket array. Notice that the modulus operation with GetBucketSize() will usually be necessary:
StudentHash::bucket_array_unit StudentHash::GetHashReference(const char *ref) const {
sdbm(ref)%GetBucketSize();
}
StudentHash::bucket_array_unit StudentHash::GetHashReference(const Student &ref) const {
return GetHashReference(ref.m_sName.c_str());
//just call GetHashReference(const char *)
}
It is up to you to choose which hash function to use. Sdbm() is an attractive choice for keys which are strings, with good performance and good randomization, so I recommend that one. And there: your hash class is implemented! All you need to do is use it, and for that you should read about the Add(), Get() and SetBucketSize() methods.
I've made use of 4 macros whose names are of the form *ATTRIBUTES. You can define these as compiler-specific attributes, which may make the calling of the pure virtuals slightly more efficient. One such implementation might be of the GNU attribute __attribute__((fastcall)) or the Visual C++ __fastcall. Use of attributes can be very dangerous in GNU, and probably to a lesser extent in Visual C++, so you should be very careful that you know what you are doing when using those. So far, I've been using __attribute__((fastcall)) quite happily without incident, so due diligence should yield safe usage.
This hash table is of the chaining variety. The chains are maintained by linked list. This construct is ideal for roles where the number of lookups on the table do not greatly exceed the insertions and deletions on the table.
If you'd like to make improvements to my code, I'll give you some tips on understanding it.
Hungarian notation is a favourite of mine, but I prefer an improvement that I call Canadian notation.
I've tried to clean up the code. However, I have left in some debug code that I was reluctant to trim, since I've been able to make use of it at one time or another. Unlike all the other methods, it's not very well documented and, in some cases, usage might not be very intuitive. So, keep in mind that you can ignore it without losing any key points in understanding the code.
One can iterate through a hash table using the "hasherators," which borrow from the STL iterators. I found the iterator concept difficult to digest at first, but it is probably the best way to sequentially traverse such a random access data structure efficiently, should it be necessary.
The C++ culture seems to believe in only making public pure virtuals. This would probably be why, if you've enabled such a warning, GNU C++ will always give you a warning about a class with virtual methods not having a virtual destructor, even if the virtual methods are not public or the destructor is not public. So far, I have not discovered a reason why pure virtuals should not be private, so that the abstract class can call them, rather than having the owner call them through what is probably a pointer to the instance. This philosophy makes pure virtuals like callbacks, and still takes full advantage of the strengths of abstract classes.
One might notice that I used the STL array template vector, but did not use the linked list. I found the linked list frustrating at the time and was getting a lot of debug errors about the improper use of iterators. So, I just dumped it and created my own linked list. Also, I'm not entirely certain of the efficiency of STL's linked list, but I am of the one I wrote.
| You must Sign In to use this message board. | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 19 Mar 2008 Editor: Deeksha Shenoy |
Copyright 2008 by AngusEm Everything else Copyright © CodeProject, 1999-2009 Web13 | Advertise on the Code Project |