Dynamic Open Hash Table (DOHT) using CPtrList class

Arash Sabet

Rate me:

2.92/5 (9 votes)

2 Jun 2000

114.9K

Making assorted hash table of strings and/or other data types.

Download source files - 4 Kb

Introduction

There are several data structure types in the computer programming world that can be very useful for word processing, tokens, compilers and/or storing data for the purpose of fast seeking and even categorizing.

I decided to submit some very useful code that I used in one of my projects for arranging data in memory as a Dynamic Open Hash Table (DOHT). In that project, I needed to have a data structure of the subscribers' information of a company so that they were categorized on the basis of the first letter in their name and were sorted in ascending order. The subscribers with the same first letter in their name belong to a specified bucket, and buckets are chained sequentially. After you read the code and its explanations, you will find out that it can be useful for writing a word dictionary application, among other things.

How does it work?

1) Stored data

CData is the class which contains the stored data. Assume that the stored data in our code is a CString object called m_pivot, used to manipulate data in our DOHT, and in this state our DOHT will be a CString-based table. We can even change our m_pivot object to any other desired data type to create a different type-based DOHT. It's quite clear that we can have some other member variables in CData class to store other information, but only one member variable will play the roll of m_pivot in the following class.

class CData  
{
public:
    CString m_pivot;   // This is the pivot element we 
    CData();           // manipulate our table on.
    virtual ~CData();
};

2) Buckets

Now, we must look for a trend to store CData objects as a chain so that all these objects' pivot elemets (m_pivot) have the same beginning letter such as 'A' or 'B', and so on. I mean all of the objects with the beginning letter 'A' will be chained in the bucket of letter 'A' and the ones with the beginning letter 'B' will be chained in the bucket of letter 'B', and so on. In this section, I'm going to explain what the CBucket class is and does.

class CBucket
{
public:
    unsigned long GetIndex(void);         // Returns the Index of the 
                                          // current bucket
    POSITION Exists(CData* pData);        // Checks the existence of the 
                                          // specified data
    BOOL RemoveItem(CData* pItem);        // Removes the specified data 
                                          // from the current bucket
    CPtrList* GetDataList(void);          // Returns data list of the 
                                          // current bucket
    void SetIndex(unsigned long nIndex);  // Sets the Index for the new bucket
    void InsertSorted(CData* pData);      // Inserts an item ascending
    CBucket();
    virtual ~CBucket();

protected:
    CPtrList* m_pDataList;    // The list of data
    unsigned long m_nIndex;   // Bucket index
private:
};

Each bucket has its own 'index' of type unsigned long called m_nIndex for its header. Later, I will explain how this value is initialized. Please note that m_nIndex is the corresponding number of the 1st letter of m_pivot. Also, CBucket::m_pDataList is an object of type CPtrList by which we can chain the stored data sequentially. The number of inserted data is unlimited as long as the computer has available memory space.

The function CBucket::InsertSorted(CData* pData) inserts data in ascending order. By changing the comparison operator '>' in the following code you will be able to insert items in descending order.

void CBucket::InsertSorted(CData *pData)
{
    if (pData)
    {
        POSITION pos = m_pDataList->GetHeadPosition();

        if (pos == NULL)
        {
            m_pDataList->AddHead(pData);
            return;
        }

        CData* data = (CData*)m_pDataList->GetAt(pos);

        while (pos && pData->m_pivot >= data->m_pivot)
            m_pDataList->GetNext(pos);

        if (pos == NULL)
            m_pDataList->AddTail(pData);
        else
            m_pDataList->InsertBefore(pos,pData);
    }
}

Since there are several chained items in each bucket, we must develop a destructor CBucket::~CBucket() so that all of the existing CData objects are removed and destroyed, then CBucket::m_pDataList is deleted and the occupied memory is released.

CBucket::~CBucket()
{
    if (!m_pDataList)
        return;

    POSITION position = m_pDataList->GetHeadPosition();

    while (position)
    {
        POSITION temp = position;
        m_pDataList->GetNext(position);
        CData* pData = (CData*)m_pDataList->GetAt(temp);
        m_pDataList->RemoveAt(temp);
        delete pData;
    }
    
    delete m_pDataList;
}

3) Hash table

CHashTable is the container for our CBucket buckets. As a matter of fact, a CHashTable object is an object that stores buckets as a CPtrList object linked list ascending sequentially. The ascention order is due to the existence of an index that's corresponding to the pivot element. In this class, the number of chained buckets is unlimited as long as the computer has available memory space or until we reach the maximum range of an unsigned long. Here, CHashTable::m_pBuckets stores a list of buckets.

class CHashTable
{
public:
    CPtrList* GetBuckets(void);                //Returns the list of 
                                                // buckets in the hash table
    unsigned long GetNumOfBuckets(void);        //Returns the number of buckets
    CBucket* SeekBucket(unsigned long nIndex);  //Seeks and returns the 
                                                // nIndexth bucket of buckets list
    BOOL Exists(CData* pData,POSITION& pos, unsigned long& nIndex);  //Checks the 
                                                // existance of desired data
    BOOL DeleteItem(CData* pData);              //Removes the existed data from 
                                                // hash table
    void InsertSorted(CData* pData);            //Inserts data ascending
    CHashTable();
    virtual ~CHashTable();

private:
    unsigned long m_nNumOfBuckets;  //The number of created buckets
    CPtrList* m_pBuckets;           //The list of buckets
};

The function CHashTable::InsertSorted(CData* pData) inserts a new item into our DOHT. According to the function CHashTable::InsertSorted(CData* pData), the inserted CData object's m_pivot member variable is checked to get its corresponding number of the 1st letter of m_pivot as an index, used to find out whether its corresponding CBucket object exists in the list of buckets or not:

...
unsigned long nIndex = (unsigned long)pData->m_pivot[0];
...

If its corresponding object exists, pData will be inserted by calling CBucket::InsertSorted(CData* pData), otherwise, a new bucket is created and pData is inserted in it by calling the same member function of CBucket. This is the final code for inserting a new item in DOHT:

void CHashTable::InsertSorted(CData *pData)
{
    if (!pData)
        return;

    unsigned long nIndex = (unsigned long)pData->m_pivot[0];
    POSITION pos = m_pBuckets->GetHeadPosition();

    if (pos == NULL)
    {
        CBucket* pBucket = new CBucket;
        ASSERT(pBucket);
        pBucket->SetIndex(nIndex);
        pBucket->InsertSorted(pData);
        m_pBuckets->AddHead(pBucket);
        m_nNumOfBuckets++;
        return;
    }

    CBucket* pBucket = (CBucket*)m_pBuckets->GetHead();
    unsigned long index = pBucket->GetIndex();

    while (pos && nIndex > index)
    {
        m_pBuckets->GetNext(pos);

        if (!pos)
            break;

        pBucket = (CBucket*)m_pBuckets->GetAt(pos);

        index = pBucket->GetIndex();

    }

    if (nIndex == index)
    {
        pBucket->InsertSorted(pData);
        return;
    }

    CBucket* pNewBucket = new CBucket;
    ASSERT(pNewBucket);
    pNewBucket->SetIndex(nIndex);
    pNewBucket->InsertSorted(pData);

    if (pos == NULL)
        m_pBuckets->AddTail(pNewBucket);
    else
        m_pBuckets->InsertBefore(pos,pNewBucket);

    m_nNumOfBuckets++;
}

Note that CHashTable::Exists(CData* pData,POSITION& pos, unsigned long& nIndex) returns not only a Boolean value to check whether an item exists in our DOHT but also the index of the corresponding bucket and the position of the desired item's corresponding bucket in CBucket::m_pDataList.

The explanation of destructor CHashTable::~CHashTable() is the same as the explanation of the destructor of CBucket.

CHashTable::~CHashTable()
{
    if (!m_pBuckets)
        return;

    POSITION pos = m_pBuckets->GetHeadPosition();
    while (pos)
    {
        POSITION temp = pos;
        CBucket* pBucket = (CBucket*)m_pBuckets->GetNext(pos);
        m_pBuckets->RemoveAt(temp);
        delete pBucket;
    }
    delete m_pBuckets;
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Arash Sabet

Web Developer

Canada

- B.S. degree in Computer Engineering.
- 10+ years experience in Turbo C, Borland C , Visual C++ and Managed C++.
- Obssessed in OOP style design and programming.
- Desigining and developing Network security tools.
- Desigining and developing a client/server appliation for sharing files among users in a way other than FTP protocol.
- Desigining and developing ISP Subscribers account management by binding to Cisco tools (NtTac+), Reporting the results on the web by ISAPI method.
- Designing and writing code to build a search engine (Web crawler) by SQL Server 7.0 and VC++.

- On-board programming of non-boundary scan memory devices like flash memories by boundary scan (IEEE 1149.1) protocol in C# and J#.

- Designing and implementing GSM gateway applications and bulk messaging.

The summary of my skills:
C#, J#, Managed C++ code, VC++, MFC, Turbo Pascal, PL/I, SQL Server, MS Access, Windows NT administration, Web site developing, Macromedia tools, Webmastering, Cisco Routers.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.