Click here to Skip to main content
Click here to Skip to main content

Persistant Hash Table

By , 27 Oct 2011
 

Introduction 

I got inspired by the NO SQL and wanted to start writing one myself, I saw few very good open source C++ NO SQL like “MongoDB” and “levelDB” and wanted to write a key value store myself. I wanted to start off with a minimalistic set of goals to start and complete in a weekend. So with these parameters, I am posting this one. I hope it will be helpful to someone who wants to develop key store as a beginner.  

Background

There are many ways to create a table and access it, each one has its own advantages and disadvantages. Here, I am using a hash function to access a record in a file. Other methods include maintain index tables, sequential access, b+trees, persistent pointers, etc.

Using the Code

I have used a simple logic to construct the persistent hash table. If we define number of buckets, their size and record size, it is easy to store and retrieve. When we want to write a record or read a record, key is passed to the hash function, with the hash value, the bucket offset is derived. Then we need to jump to that offset and write the record or read the record. Before doing that, a check is made if a record is already present in that location, if present then we jump to the next record and then that record is written.

  • CHash(long nBuck,long bucksize,long recsize): is one of the overloaded ctor where we pass number of buckets, bucket size and each record size (table size). For this table, key should be long.
  • CHash(long nBuck,long bucksize,long recsize,unsigned long (*uh)(void *)): is another ctor where user defined hash function pointer can be passed. key can be user defined.
  • createdb: creates the hdb file with the name given, writes and reads can be made.
  • writedb: opens the file for writing
  • readdb: opens the file for reading
  • closedb: closes the file handle
  • writerec: write record has two overloads, one for key type long and another for user defined key type, writes user defined data, writes data into the bucket if record is present (i.e. collision occurred) then it jumps to the next record until it finds a vacant place to write.
  • readrec: readrec  has two overloads, one for key type long and another for user defined key type, reads user defined data. Reads the record from the bucket offset, if the value searched is different, then readnextrec is used to fetch the next record with the offset taken from previous readrec call.
  • readnextrec: as mentioned above, this needs to be used to read consecutive elements in the same bucket by passing in the offset from readrec.
  • writeoffset,readoffset,GetbucketOffset are the helper functions to move around the file for writing and reading.
int CHash::GetbucketOffset(long buckno)//returns the bucket offset
{
   int offset = 1;
   offset = buckno * m_bucketsize;
   return offset;
}
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
int CHash::readrec(long key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
int CHash::readrec(void* key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
int CHash::readnextrec(long offset,void *data)
{
	offset += m_recsize;
	readoffset(offset,data,m_recsize);
	return 0;
}

Testhash.cpp is written to give a demo of how to use this class.

Limitations

May be in my next article, I can improve on the pre-definition of the size for buckets and tables. Instead, I will build these tables with some other class or use some other technique. Another limitation I found in this is if sufficient space is not allocated to a bucket, then data corruption can happen. All these are due to fixed sizes.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Dharmateja Challa
Software Developer (Senior) EMC
India India
Member
I have worked in EMS, storage and Embedded domains. Mainly taken part in design, coding and maintaining in c/c++ on windows.
For fun I play COD and also read books.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralSee the next improved versionmemberDharmateja Challa7 Nov '11 - 1:04 
GeneralRe: See the next improved versionmembersalamat200026 Nov '12 - 5:11 
Questionpersistent, not persistantmemberCharvak Karpe1 Nov '11 - 4:02 
AnswerRe: persistent, not persistantmemberDharmateja Challa7 Nov '11 - 1:00 
SuggestionRe: persistent, not persistantmemberKapsD18 May '12 - 0:19 
QuestionMore informationmemberMehdi Gholam27 Oct '11 - 4:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 27 Oct 2011
Article Copyright 2011 by Dharmateja Challa
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid