Dynamic Open Hash Table (DOHT) using CPtrList class

Arash Sabet

2.92/5 (9 votes)

Jun 3, 2000

115641

1020

Making assorted hash table of strings and/or other data types.

Download source files - 4 Kb

Introduction

There are several data structure types in the computer programming world that can be very useful for word processing, tokens, compilers and/or storing data for the purpose of fast seeking and even categorizing.

I decided to submit some very useful code that I used in one of my projects for arranging data in memory as a Dynamic Open Hash Table (DOHT). In that project, I needed to have a data structure of the subscribers' information of a company so that they were categorized on the basis of the first letter in their name and were sorted in ascending order. The subscribers with the same first letter in their name belong to a specified bucket, and buckets are chained sequentially. After you read the code and its explanations, you will find out that it can be useful for writing a word dictionary application, among other things.

How does it work?

1) Stored data

CData is the class which contains the stored data. Assume that the stored data in our code is a CString object called m_pivot, used to manipulate data in our DOHT, and in this state our DOHT will be a CString-based table. We can even change our m_pivot object to any other desired data type to create a different type-based DOHT. It's quite clear that we can have some other member variables in CData class to store other information, but only one member variable will play the roll of m_pivot in the following class.

class CData  
{
public:
    CString m_pivot;   // This is the pivot element we 

    CData();           // manipulate our table on.

    virtual ~CData();
};

2) Buckets

Now, we must look for a trend to store CData objects as a chain so that all these objects' pivot elemets (m_pivot) have the same beginning letter such as 'A' or 'B', and so on. I mean all of the objects with the beginning letter 'A' will be chained in the bucket of letter 'A' and the ones with the beginning letter 'B' will be chained in the bucket of letter 'B', and so on. In this section, I'm going to explain what the CBucket class is and does.

class CBucket
{
public:
    unsigned long GetIndex(void);         // Returns the Index of the 

                                          // current bucket

    POSITION Exists(CData* pData);        // Checks the existence of the 

                                          // specified data

    BOOL RemoveItem(CData* pItem);        // Removes the specified data 

                                          // from the current bucket

    CPtrList* GetDataList(void);          // Returns data list of the 

                                          // current bucket

    void SetIndex(unsigned long nIndex);  // Sets the Index for the new bucket

    void InsertSorted(CData* pData);      // Inserts an item ascending

    CBucket();
    virtual ~CBucket();

protected:
    CPtrList* m_pDataList;    // The list of data

    unsigned long m_nIndex;   // Bucket index

private:
};

Each bucket has its own 'index' of type unsigned long called m_nIndex for its header. Later, I will explain how this value is initialized. Please note that m_nIndex is the corresponding number of the 1st letter of m_pivot. Also, CBucket::m_pDataList is an object of type CPtrList by which we can chain the stored data sequentially. The number of inserted data is unlimited as long as the computer has available memory space.

The function CBucket::InsertSorted(CData* pData) inserts data in ascending order. By changing the comparison operator '>' in the following code you will be able to insert items in descending order.

void CBucket::InsertSorted(CData *pData)
{
    if (pData)
    {
        POSITION pos = m_pDataList->GetHeadPosition();

        if (pos == NULL)
        {
            m_pDataList->AddHead(pData);
            return;
        }

        CData* data = (CData*)m_pDataList->GetAt(pos);

        while (pos && pData->m_pivot >= data->m_pivot)
            m_pDataList->GetNext(pos);

        if (pos == NULL)
            m_pDataList->AddTail(pData);
        else
            m_pDataList->InsertBefore(pos,pData);
    }
}

Since there are several chained items in each bucket, we must develop a destructor CBucket::~CBucket() so that all of the existing CData objects are removed and destroyed, then CBucket::m_pDataList is deleted and the occupied memory is released.

CBucket::~CBucket()
{
    if (!m_pDataList)
        return;

    POSITION position = m_pDataList->GetHeadPosition();

    while (position)
    {
        POSITION temp = position;
        m_pDataList->GetNext(position);
        CData* pData = (CData*)m_pDataList->GetAt(temp);
        m_pDataList->RemoveAt(temp);
        delete pData;
    }
    
    delete m_pDataList;
}

3) Hash table

CHashTable is the container for our CBucket buckets. As a matter of fact, a CHashTable object is an object that stores buckets as a CPtrList object linked list ascending sequentially. The ascention order is due to the existence of an index that's corresponding to the pivot element. In this class, the number of chained buckets is unlimited as long as the computer has available memory space or until we reach the maximum range of an unsigned long. Here, CHashTable::m_pBuckets stores a list of buckets.

class CHashTable
{
public:
    CPtrList* GetBuckets(void);                //Returns the list of 

                                                // buckets in the hash table

    unsigned long GetNumOfBuckets(void);        //Returns the number of buckets

    CBucket* SeekBucket(unsigned long nIndex);  //Seeks and returns the 

                                                // nIndexth bucket of buckets list

    BOOL Exists(CData* pData,POSITION& pos, unsigned long& nIndex);  //Checks the 

                                                // existance of desired data

    BOOL DeleteItem(CData* pData);              //Removes the existed data from 

                                                // hash table

    void InsertSorted(CData* pData);            //Inserts data ascending

    CHashTable();
    virtual ~CHashTable();

private:
    unsigned long m_nNumOfBuckets;  //The number of created buckets

    CPtrList* m_pBuckets;           //The list of buckets

};

The function CHashTable::InsertSorted(CData* pData) inserts a new item into our DOHT. According to the function CHashTable::InsertSorted(CData* pData), the inserted CData object's m_pivot member variable is checked to get its corresponding number of the 1st letter of m_pivot as an index, used to find out whether its corresponding CBucket object exists in the list of buckets or not:

...
unsigned long nIndex = (unsigned long)pData->m_pivot[0];
...

If its corresponding object exists, pData will be inserted by calling CBucket::InsertSorted(CData* pData), otherwise, a new bucket is created and pData is inserted in it by calling the same member function of CBucket. This is the final code for inserting a new item in DOHT:

void CHashTable::InsertSorted(CData *pData)
{
    if (!pData)
        return;

    unsigned long nIndex = (unsigned long)pData->m_pivot[0];
    POSITION pos = m_pBuckets->GetHeadPosition();

    if (pos == NULL)
    {
        CBucket* pBucket = new CBucket;
        ASSERT(pBucket);
        pBucket->SetIndex(nIndex);
        pBucket->InsertSorted(pData);
        m_pBuckets->AddHead(pBucket);
        m_nNumOfBuckets++;
        return;
    }

    CBucket* pBucket = (CBucket*)m_pBuckets->GetHead();
    unsigned long index = pBucket->GetIndex();

    while (pos && nIndex > index)
    {
        m_pBuckets->GetNext(pos);

        if (!pos)
            break;

        pBucket = (CBucket*)m_pBuckets->GetAt(pos);

        index = pBucket->GetIndex();

    }

    if (nIndex == index)
    {
        pBucket->InsertSorted(pData);
        return;
    }

    CBucket* pNewBucket = new CBucket;
    ASSERT(pNewBucket);
    pNewBucket->SetIndex(nIndex);
    pNewBucket->InsertSorted(pData);

    if (pos == NULL)
        m_pBuckets->AddTail(pNewBucket);
    else
        m_pBuckets->InsertBefore(pos,pNewBucket);

    m_nNumOfBuckets++;
}

Note that CHashTable::Exists(CData* pData,POSITION& pos, unsigned long& nIndex) returns not only a Boolean value to check whether an item exists in our DOHT but also the index of the corresponding bucket and the position of the desired item's corresponding bucket in CBucket::m_pDataList.

The explanation of destructor CHashTable::~CHashTable() is the same as the explanation of the destructor of CBucket.

CHashTable::~CHashTable()
{
    if (!m_pBuckets)
        return;

    POSITION pos = m_pBuckets->GetHeadPosition();
    while (pos)
    {
        POSITION temp = pos;
        CBucket* pBucket = (CBucket*)m_pBuckets->GetNext(pos);
        m_pBuckets->RemoveAt(temp);
        delete pBucket;
    }
    delete m_pBuckets;
}