Implementing Ordering in a Hashtable

Animesh Srivastava

4.25/5 (14 votes)

Jun 21, 2002

6 min read

117149

296

A Hashtable provides a key-value lookup functionality. Hashtable in java.util.Hashtable is not guaranteed to Enumerate (or Iterate) keys in the same order as they were put in. This article explains how to implement ordering in a Hashtable.

Download source files - 3.11 Kb

Introduction
Why Ordering
The Algorithm
Some Issues to be Careful of!
Implementation
Performance
Final Notes

Introduction

Hashtables are a convenient choice when it comes to searching, particularly because in the collection, an element is stored depending upon the information in the element itself. This information is extracted in an (almost) exclusive way by use of hash-functions. While hashtables are a useful data structure, they do lack one thing - the order in which elements are stored in a hashtable is not preserved. That means, given a Hashtable, I can get an Iterator on the set of Hashtable keys, (by keySet() method) or I can get an Enumeration (by keys() method). But none of the two is guaranteed to retrieve elements in the same order as they were put in. This is plausible because Hashtables have generally been used more for their hashing capability rather than anything else.

Why Ordering

In our project, we had a situation where the results returned by an SQL query was stored in a Hashtable (so that it could be searched easily later). But at the same time, we needed to preserve the ordering returned by SQL, since ordering at database side is much faster than doing it programmatically. Moreover, database can handle multiple and complex sort orders quite well (using the order by clause). Here was a situation where we needed to implement ordering capabilities into a Hashtable.

The Algorithm

The obvious place to look was into the source code of the Java programming language, which gave us Hashtable.java. The solution came up right in the form of writing another class - OrderedHashtable.java which uses some techniques like inner classes.

The algo was simple: Take a Hashtable and a Vector. Whenever you put the objects into the Hashtable, put the corresponding keys in the Vector. Since Vector is ordered, Enumerations or Iterators on it are guaranteed to traverse the keys in the same order as they were put in. So if later you need to iterate through all the objects in the Hashtable (in the order they were put in), just take hold of the Iterator of the Vector and get the corresponding objects from the Hashtable. There you have an OrderedHashtable! Of course it still has the hashing capabilities of a Hashtable, and you can of course retrieve an object directly if you have its key (same way as in a Hashtable). Only new thing is that, Iterator (or Enumeration) is guaranteed to be ordered.

Some Issues to be Careful of!

Of course, there are some important things to take care of, like synchronization. Since items can be added as well as removed from OrderedHashtable, we need to make sure that the internal Vector and Hashtable of the class remains thread-safe. If two threads access it at the same time and one adds and other deletes the same object, it could leave the Hashtable in an arbitrary, indeterminate state, where either the Vector or the Hashtable or the internal COUNT of number of elements (or all of the three) may be incorrect. This calls for certain sections of code to be synchronized.

Similarly, there is another important issue of overflow. Both Hashtable and Vector are resizable, but the way their size increases is different. In Hashtable, we have a loadfactor which specifies how full the Hashtable is allowed to become before the size increases (by calling the rehash() method). Whenever the number of entries in a Hashtable exceeds the product of loadfactor and current capacity, the capacity is increased by twice the current capacity. Default values of initial capacity and loadfactor for Hashtable are 10 and 0.75 respectively. So in the default case, when the 9th object is put in, the Hashtable is doubled to 20.

The Vector however operates differently. In a Vector, you can specify a capacityIncrement value. Whenever the Vector becomes full, its capacity is incremented by capacityIncrement (or doubled in case the capacityIncrement is 0). Default values of initialcapacity and capacityIncrement for a Vector are 10 and 0. Here in the default case, the size grows only when the 11th object is put in and the size grows to 20. Again when 21st object comes in, size is doubled to 40 and so on. One would need to be careful in put() and get() methods to handle this correctly. Other issues like deletion etc, can also be handled easily with slight care.

Implementation

Now comes the magic of inner classes. Iterators and Enumerations are implemented as inner classes. (Obviously, how else?). When either Enumeration (enumerateKeys() method) or an Iterator (iterateKeys() method) is called, it is this inner class which iterates on a private Vector and returns the corresponding object from the Hashtable and makes sure to keep it within error limits.

Performance

A final thought that comes to mind immediately after this is - "What is the performance overhead of implementing this ordering?" Since we are using a Vector, which is known to be one of the slow performing classes among all Java classes, we do expect our OrderedHashtable to be slower than Hashtable. I have included two small test files apart from the source code of OrderedHashtable, in the download package. The code is well commented, and I hope it is easily understood. While one of them (Test.java) establishes the fact that ordering is properly implemented (after all that is what this article is about!) the other file (Performance.java) does some basic performance testing. It populates a Hashtable and an OrderedHashtable with one million objects and compares the time spent in accessing them. While Test.java prints the items retrieved to the standard output stream to show that they are indeed in the same order as they were inserted, Performance.java doesn't print the objects themselves. There are two reasons for this - one, I don't want to be flooding the screen with two million lines, and second (more importantly) the System.out.println() will itself take some time, whereas we just want to compare the time spent in accessing the elements of the Hashtables.

On my system, a number of trial runs indicate that for very large values (like 1 million), the time required by both are almost same (31 milliseconds). This is surprising because I had expected the Vector to slow down the OrderedHashtable quite a bit. Yes, once in a while, the OrderedHashtable takes twice the time as that of Hashtable, but such instances are rare. You are welcome to test-run it on your system. Your mileage may vary because of the system, the system settings and the current load on it. If there are any issues, comments, suggestions, or new findings, you are most welcome to enlighten me and other fellow programmers at CP.

Final Notes

The source code presented here implements most of the methods of java.util.Hashtable. But there are a few methods which have been left out, some because they are not in line with the concept of ordering (e.g. those which return a Set), and some because of their being declared as protected or private in the underlying Hashtable class (e.g. rehash()). However, enough methods have been implemented to provide the full functionality of a Hashable (for almost all situations) while maintaining ordering at the same time. But if you feel that it should be having some other methods also, which could improve the functionality/usefulness of the OrderedHashtable, then you are most welcome to provide any feedback and discuss it with the CP community. :-)

Thanks for reading the article. Have a great day!