Hashtables are a convenient choice when it comes to searching, particularly because in the collection, an element is stored depending upon the information in the element itself. This information is extracted in an (almost) exclusive way by use of hash-functions. While hashtables are a useful data structure, they do lack one thing - the order in which elements are stored in a hashtable is not preserved. That means, given a
Hashtable, I can get an
Iterator on the set of
Hashtable keys, (by
keySet() method) or I can get an
keys() method). But none of the two is guaranteed to retrieve elements in the same order as they were put in. This is plausible because
Hashtables have generally been used more for their hashing capability rather than anything else.
In our project, we had a situation where the results returned by an SQL query was stored in a
Hashtable (so that it could be searched easily later). But at the same time, we needed to preserve the ordering returned by SQL, since ordering at database side is much faster than doing it programmatically. Moreover, database can handle multiple and complex sort orders quite well (using the
order by clause). Here was a situation where we needed to implement ordering capabilities into a
The obvious place to look was into the source code of the Java programming language, which gave us
Hashtable.java. The solution came up right in the form of writing another class -
OrderedHashtable.java which uses some techniques like inner classes.
The algo was simple: Take a
Hashtable and a
Vector. Whenever you put the objects into the
Hashtable, put the corresponding keys in the
Vector is ordered,
Iterators on it are guaranteed to traverse the keys in the same order as they were put in. So if later you need to iterate through all the objects in the
Hashtable (in the order they were put in), just take hold of the
Iterator of the
Vector and get the corresponding objects from the
Hashtable. There you have an
OrderedHashtable! Of course it still has the hashing capabilities of a
Hashtable, and you can of course retrieve an object directly if you have its key (same way as in a
Hashtable). Only new thing is that,
Enumeration) is guaranteed to be ordered.
Some Issues to be Careful of!
Of course, there are some important things to take care of, like synchronization. Since items can be added as well as removed from
OrderedHashtable, we need to make sure that the internal
Hashtable of the class remains thread-safe. If two threads access it at the same time and one adds and other deletes the same object, it could leave the
Hashtable in an arbitrary, indeterminate state, where either the
Vector or the
Hashtable or the internal
COUNT of number of elements (or all of the three) may be incorrect. This calls for certain sections of code to be synchronized.
Similarly, there is another important issue of overflow. Both
Vector are resizable, but the way their size increases is different. In
Hashtable, we have a
loadfactor which specifies how full the
Hashtable is allowed to become before the size increases (by calling the
rehash() method). Whenever the number of entries in a
Hashtable exceeds the product of
current capacity, the capacity is increased by twice the
current capacity. Default values of
initial capacity and
Hashtable are 10 and 0.75 respectively. So in the default case, when the 9th object is put in, the
Hashtable is doubled to 20.
Vector however operates differently. In a
Vector, you can specify a
capacityIncrement value. Whenever the
Vector becomes full, its capacity is incremented by
capacityIncrement (or doubled in case the
capacityIncrement is 0). Default values of
capacityIncrement for a
Vector are 10 and 0. Here in the default case, the size grows only when the 11th object is put in and the size grows to 20. Again when 21st object comes in, size is doubled to 40 and so on. One would need to be careful in
get() methods to handle this correctly. Other issues like deletion etc, can also be handled easily with slight care.
Now comes the magic of
Enumerations are implemented as inner classes. (Obviously, how else?). When either
enumerateKeys() method) or an
iterateKeys() method) is called, it is this inner class which iterates on a private
Vector and returns the corresponding object from the
Hashtable and makes sure to keep it within error limits.
A final thought that comes to mind immediately after this is - "What is the performance overhead of implementing this ordering?" Since we are using a
Vector, which is known to be one of the slow performing classes among all Java classes, we do expect our
OrderedHashtable to be slower than
Hashtable. I have included two small test files apart from the source code of
OrderedHashtable, in the download package. The code is well commented, and I hope it is easily understood. While one of them (Test.java) establishes the fact that ordering is properly implemented (after all that is what this article is about!) the other file (Performance.java) does some basic performance testing. It populates a
Hashtable and an
OrderedHashtable with one million objects and compares the time spent in accessing them. While Test.java prints the items retrieved to the standard output stream to show that they are indeed in the same order as they were inserted, Performance.java doesn't print the objects themselves. There are two reasons for this - one, I don't want to be flooding the screen with two million lines, and second (more importantly) the
System.out.println() will itself take some time, whereas we just want to compare the time spent in accessing the elements of the
On my system, a number of trial runs indicate that for very large values (like 1 million), the time required by both are almost same (31 milliseconds). This is surprising because I had expected the
Vector to slow down the
OrderedHashtable quite a bit. Yes, once in a while, the
OrderedHashtable takes twice the time as that of
Hashtable, but such instances are rare. You are welcome to test-run it on your system. Your mileage may vary because of the system, the system settings and the current load on it. If there are any issues, comments, suggestions, or new findings, you are most welcome to enlighten me and other fellow programmers at CP.
The source code presented here implements most of the methods of
java.util.Hashtable. But there are a few methods which have been left out, some because they are not in line with the concept of ordering (e.g. those which return a
Set), and some because of their being declared as
private in the underlying
Hashtable class (e.g.
rehash()). However, enough methods have been implemented to provide the full functionality of a
Hashable (for almost all situations) while maintaining ordering at the same time. But if you feel that it should be having some other methods also, which could improve the functionality/usefulness of the
OrderedHashtable, then you are most welcome to provide any feedback and discuss it with the CP community.
Thanks for reading the article. Have a great day!