Introduction
There are several data structure types in the computer programming world that can be very useful for word processing, tokens, compilers and/or storing data for the purpose of fast seeking and even categorizing.
I decided to submit some very useful code that I used in one of my projects for arranging data in memory as a Dynamic Open Hash Table (DOHT). In that project, I needed to have a data structure of the subscribers' information of a company so that they were categorized on the basis of the first letter in their name and were sorted in ascending order. The subscribers with the same first letter in their name belong to a specified bucket, and buckets are chained sequentially. After you read the code and its explanations, you will find out that it can be useful for writing a word dictionary application, among other things.
How does it work?
1) Stored data
CData
is the class which contains the stored data. Assume that the stored data in our code is a CString
object called m_pivot
, used to manipulate data in our DOHT, and in this state our DOHT will be a CString
-based table. We can even change our m_pivot
object to any other desired data type to create a different type-based DOHT. It's quite clear that we can have some other member variables in CData
class to store other information, but only one member variable will play the roll of m_pivot
in the following class.
class CData
{
public:
CString m_pivot;
CData();
virtual ~CData();
};
2) Buckets
Now, we must look for a trend to store CData
objects as a chain so that all these objects' pivot elemets (m_pivot
) have the same beginning letter such as 'A' or 'B', and so on. I mean all of the objects with the beginning letter 'A' will be chained in the bucket of letter 'A' and the ones with the beginning letter 'B' will be chained in the bucket of letter 'B', and so on. In this section, I'm going to explain what the CBucket
class is and does.
class CBucket
{
public:
unsigned long GetIndex(void);
POSITION Exists(CData* pData);
BOOL RemoveItem(CData* pItem);
CPtrList* GetDataList(void);
void SetIndex(unsigned long nIndex);
void InsertSorted(CData* pData);
CBucket();
virtual ~CBucket();
protected:
CPtrList* m_pDataList;
unsigned long m_nIndex;
private:
};
Each bucket has its own 'index' of type unsigned long
called m_nIndex
for its header. Later, I will explain how this value is initialized. Please note that m_nIndex
is the corresponding number of the 1st letter of m_pivot
. Also, CBucket::m_pDataList
is an object of type CPtrList
by which we can chain the stored data sequentially. The number of inserted data is unlimited as long as the computer has available memory space.
The function CBucket::InsertSorted(CData* pData)
inserts data in ascending order. By changing the comparison operator '>' in the following code you will be able to insert items in descending order.
void CBucket::InsertSorted(CData *pData)
{
if (pData)
{
POSITION pos = m_pDataList->GetHeadPosition();
if (pos == NULL)
{
m_pDataList->AddHead(pData);
return;
}
CData* data = (CData*)m_pDataList->GetAt(pos);
while (pos && pData->m_pivot >= data->m_pivot)
m_pDataList->GetNext(pos);
if (pos == NULL)
m_pDataList->AddTail(pData);
else
m_pDataList->InsertBefore(pos,pData);
}
}
Since there are several chained items in each bucket, we must develop a destructor CBucket::~CBucket()
so that all of the existing CData
objects are removed and destroyed, then CBucket::m_pDataList
is deleted and the occupied memory is released.
CBucket::~CBucket()
{
if (!m_pDataList)
return;
POSITION position = m_pDataList->GetHeadPosition();
while (position)
{
POSITION temp = position;
m_pDataList->GetNext(position);
CData* pData = (CData*)m_pDataList->GetAt(temp);
m_pDataList->RemoveAt(temp);
delete pData;
}
delete m_pDataList;
}
3) Hash table
CHashTable
is the container for our CBucket
buckets. As a matter of fact, a CHashTable
object is an object that stores buckets as a CPtrList
object linked list ascending sequentially. The ascention order is due to the existence of an index that's corresponding to the pivot element. In this class, the number of chained buckets is unlimited as long as the computer has available memory space or until we reach the maximum range of an unsigned long
. Here, CHashTable::m_pBuckets
stores a list of buckets.
class CHashTable
{
public:
CPtrList* GetBuckets(void);
unsigned long GetNumOfBuckets(void);
CBucket* SeekBucket(unsigned long nIndex);
BOOL Exists(CData* pData,POSITION& pos, unsigned long& nIndex);
BOOL DeleteItem(CData* pData);
void InsertSorted(CData* pData);
CHashTable();
virtual ~CHashTable();
private:
unsigned long m_nNumOfBuckets;
CPtrList* m_pBuckets;
};
The function CHashTable::InsertSorted(CData* pData)
inserts a new item into our DOHT. According to the function CHashTable::InsertSorted(CData* pData)
, the inserted CData
object's m_pivot
member variable is checked to get its corresponding number of the 1st letter of m_pivot
as an index, used to find out whether its corresponding CBucket
object exists in the list of buckets or not:
...
unsigned long nIndex = (unsigned long)pData->m_pivot[0];
...
If its corresponding object exists, pData
will be inserted by calling CBucket::InsertSorted(CData* pData)
, otherwise, a new bucket is created and pData
is inserted in it by calling the same member function of CBucket
. This is the final code for inserting a new item in DOHT:
void CHashTable::InsertSorted(CData *pData)
{
if (!pData)
return;
unsigned long nIndex = (unsigned long)pData->m_pivot[0];
POSITION pos = m_pBuckets->GetHeadPosition();
if (pos == NULL)
{
CBucket* pBucket = new CBucket;
ASSERT(pBucket);
pBucket->SetIndex(nIndex);
pBucket->InsertSorted(pData);
m_pBuckets->AddHead(pBucket);
m_nNumOfBuckets++;
return;
}
CBucket* pBucket = (CBucket*)m_pBuckets->GetHead();
unsigned long index = pBucket->GetIndex();
while (pos && nIndex > index)
{
m_pBuckets->GetNext(pos);
if (!pos)
break;
pBucket = (CBucket*)m_pBuckets->GetAt(pos);
index = pBucket->GetIndex();
}
if (nIndex == index)
{
pBucket->InsertSorted(pData);
return;
}
CBucket* pNewBucket = new CBucket;
ASSERT(pNewBucket);
pNewBucket->SetIndex(nIndex);
pNewBucket->InsertSorted(pData);
if (pos == NULL)
m_pBuckets->AddTail(pNewBucket);
else
m_pBuckets->InsertBefore(pos,pNewBucket);
m_nNumOfBuckets++;
}
Note that CHashTable::Exists(CData* pData,POSITION& pos, unsigned long& nIndex)
returns not only a Boolean
value to check whether an item exists in our DOHT but also the index of the corresponding bucket and the position of the desired item's corresponding bucket in CBucket::m_pDataList
.
The explanation of destructor CHashTable::~CHashTable()
is the same as the explanation of the destructor of CBucket
.
CHashTable::~CHashTable()
{
if (!m_pBuckets)
return;
POSITION pos = m_pBuckets->GetHeadPosition();
while (pos)
{
POSITION temp = pos;
CBucket* pBucket = (CBucket*)m_pBuckets->GetNext(pos);
m_pBuckets->RemoveAt(temp);
delete pBucket;
}
delete m_pBuckets;
}
- B.S. degree in Computer Engineering.
- 10+ years experience in Turbo C, Borland C , Visual C++ and Managed C++.
- Obssessed in OOP style design and programming.
- Desigining and developing Network security tools.
- Desigining and developing a client/server appliation for sharing files among users in a way other than FTP protocol.
- Desigining and developing ISP Subscribers account management by binding to Cisco tools (NtTac+), Reporting the results on the web by ISAPI method.
- Designing and writing code to build a search engine (Web crawler) by SQL Server 7.0 and VC++.
- On-board programming of non-boundary scan memory devices like flash memories by boundary scan (IEEE 1149.1) protocol in C# and J#.
- Designing and implementing GSM gateway applications and bulk messaging.
The summary of my skills:
C#, J#, Managed C++ code, VC++, MFC, Turbo Pascal, PL/I, SQL Server, MS Access, Windows NT administration, Web site developing, Macromedia tools, Webmastering, Cisco Routers.