Click here to Skip to main content
12,889,951 members (44,884 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


163 bookmarked
Posted 18 Jan 2012

RaptorDB - The Key Value Store V2

, 8 Feb 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
Even faster Key/Value store nosql embedded database engine utilizing the new MGIndex data structure with MurMur2 Hashing and WAH Bitmap indexes for duplicates.
This is an old version of the currently published article.


This article is the version 2 of my previous article found here (, I had to write a new article because in this version I completely redesigned and re-architected the original and so it would not go with the previous article. In this version I have done away with the b+tree and hash index in favor of my own MGIndex structure which for all intents and purposes is superior and the performance numbers speak for themselves.

What is RaptorDB?

Here is a brief overview of all the terms used to describe RaptorDB:

  • Embedded: You can use RaptorDB inside your application as you would any other DLL, and you don't need to install services or run external programs.
  • NoSQL: A grass roots movement to replace relational databases with more relevant and specialized storage systems to the application in question. These systems are usually designed for performance.
  • Persisted: Any changes made are stored on hard disk, so you never lose data on power outages or crashes.
  • Dictionary: A key/value storage system much like the implementation in .NET.
  • MurMurHash: A non cryptographic hash function created by Austin Appleby in 2008 (


RaptorDB has the following features :
  • Very fast performance (typically 2x the insert and 4x the read performance of RaptorDB v1)
  • Extremely small foot print at ~50kb.
  • No dependencies.
  • Multi-Threaded support for read and writes.
  • Data pages are separate from the main tree structure, so can be freed from memory if needed, and loaded on demand.
  • Automatic index file recovery on non-clean shutdowns.
  • String Keys are UTF8 encoded and limited to 60 bytes if not specified otherwise (maximum is 255 chars).
  • Support for long string Keys with the RaptorDBString class.
  • Duplicate keys are stored as a WAH Bitmap Index for optimal storage and speed in access.
  • Two mode of operation Flush immediate and Deferred ( the latter being faster at the expense of the risk of non-clean shutdown data loss).
  • Enumerate the index is supported.
  • Enumerate the Storage file is supported.
  • Remove Key is supported.

Why another data structure?

There is always room for improvement, and the ever need for faster systems compels us to create new methods of doing things.
is no exception to this rule. Currently MGindex outperforms b+tree by a factor of 15x on writes and 21x on reads, while keeping the main feature of disk friendliness of a b+tree structure.

The problem with a b+tree

Theoretically a b+tree is O(N log k N) or log base k of N, now for the typical values of k which are above 200 for example the b+tree should outperform any binary tree because it will use less operations. However I have found the following problems which hinder performance :

  • Pages in a b+tree are usually implemented as a list or array of child pointers and so while finding and inserting a value is a O(log k) operation the process actually has to move children around in the array or list, and so is time consuming.
  • Splitting a page in b+tree has to fix parent nodes and children so effectively will lock the tree for the duration, so parallel updates are very very difficult and have spawned a lot of research articles.

Requirements of a good index structure

So what makes a good index structure, here are what I consider essential features of one:

  • Page-able data structure:
    • Easy loading and saving to disk.
    • Free memory on memory constraints.
    • On-demand loading for optimal memory usage.
  • Very fast insert and retrieve.
  • Multi-thread-able and parallel-able usage.
  • Pages should be linked together so you can do range queries by going to the next page easily.

The MGIndex

MGIndex takes the best features of a b+tree and improves upon on them at the same time removing the impediments. MGIndex is also extremely simple in design as the following diagram shows:

As you can see the page list is a sorted dictionary of first keys from each page along with associated page number and page items count. A page is a dictionary of key and record number pairs.
This format ensures a semi sorted key list, in that within a page the data is not sorted but pages are in sort order relative to each other. So a look-up for a key just compares the first keys in the page list to find the page required and gets the key from the page's dictionary.

MGIndex is O(log M)+O(1), M being N / PageItemCount [PageItemCount = 10000 in the Globals class]. This means that you do a binary search in the page list in log M time and get the value in O(1) time within a page.

RaptorDB starts off by loading the page list and it is good to go from there and pages are loaded on demand, based of usage.

Page Splits

In the event of page getting full and reaching the PageItemCount,

will sort the keys in the page's dictionary and split the data in two pages ( similar to a b+tree split) and update the page list by adding the new page and changing the first keys needed. This will ensure the sorted page progression.

Interestingly the processor architecture plays an important role here as you can see in the performance tests as it is directly related to the sorting key time, the Core iX processors seem to be very good in this regard.

Interesting side effects of MGIndex

Here are some interesting side effects of MGIndex
  • Because the data pages are separate from the Page List structure, implementing locking is easy and isolated within a page and not the whole index, not so for normal trees.
  • Splitting a page when full is simple and does not require a tree traversal for node overflow checking as in a b+tree.
  • Main page list updates are infrequent and hence the locking of the main page list structure does not impact performance.
  • The above make the MGIndex a really good candidate for parallel updates.

The road not taken / the road taken and doubled back!

Originally I used a AATree found here ( for the page structures, for being extremely good and simple structure to understand. After testing and comparing to the internal .net SortedDictionary (which is a Red-Black tree structure) it was slower and so scrapped (see the performance comparisons).

I decided against using SortedDictionary for the pages as it was slower than a normal Dictionary and for the purpose of a key value store the sorted-ness was not need and could be handled in other ways. You can switch to the SortedDictionary in the code at any time if you wish and it makes no difference to the overall code other than you can remove the sorting in the page splits.

I also tried an assorted number of sorting routines like double pivot quick sort, timsort, insertion sort and found that they all were slower than the internal .net quicksort routine in my tests.

Performance Tests

In this version I have compiled a list of computers which I have tested on and below is the results.

As you can see you get a very noticeable performance boost with the new Intel Core iX processors.

Comparing B+tree and MGIndex

For a measure of relative performance of a b+tree, Red/Black tree and MGIndex I have compiled the following results.

Times are in seconds.

B+Tree : is the index code from RaptorDB v1
: is the internal .net implementation which is said to be a Red/Black tree.

Really big data sets!

To really put the engine under pressure I did the following tests on huge data sets (times are in seconds, memory is in Gb) :

These tests were done on a HP ML120G6 system with 12Gb Ram, 10k raid disk drives running Windows 2008 Server R2 64 bit. For a measure of relative performance to RaptorDb v1 I have included a 20 million test with that engine also.

I deferred from testing the get test over 100 million record as it would require a huge array in memory to store the Guid keys for finding later, that is why there is a NT (not tested) in the table.

Interestingly the read performance is relatively linear.

Index parameter tuning

To get the most out of RaptorDB you can tune some parameters specific to your hardware.

  • PageItemCount : controls the size of each page.
Here are some of my results:

I have chosen the 10000 number as a good case in both read and writes, you are welcome to tinker with this on your own systems and see what works better for you.

Using the Code

To create or open a database you use the following code :

// to create a db for guid keys without allowing duplicates
var guiddb = RaptorDB.RaptorDB<Guid>.Open("c:\\RaptorDbTest\\multithread", false);

// to create a db for string keys with a length of 100 characters (UTF8) allowing duplicates
var strdb = RaptorDB.RaptorDB<string>.Open("c:\\intdb", 100, true);       

To insert and retrieve data you use the following code :

Guid g = Guid.NewGuid();
guiddb.Set(g, "somevalue");

string outstr="";
if(guiddb.Get(g, out outstr)) 
   // success outstr should be "somevalue"

The UnitTests project contains working example codes for different use cases so you can refer to it for more samples.

Differences to v1

The following are a list of differences in v2 opposed to v1 of RaptorDB:

  • Log Files have been removed and are not needed anymore as the MGIndex is fast enough for in-process indexing.
  • Threads have been replaced by timers.
  • The index will be saved to disk in the background without blocking the engine process.
  • Messy generic code has been simplified and the need for a RDBDataType has been removed, you can use normal int, long, string and Guid data types.
  • RemoveKey has been added.

Other than that existing code should compile as is with the new engine.

Using RaptorDBString and RaptorDBGuid

RaptorDBString is for long string keys (larger than 255 characters) and it is really useful for file paths etc. You can use it in the following way :

// long string keys without case sensitivity
var rap = new RaptorDBString(@"c:\raptordbtest\longstringkey", false);

// murmur hashed guid keys
var db = new RaptorDBGuid("c:\\RaptorDbTest\\hashedguid");  

RaptorDBGuid is a special engine which will MurMur2 hash the input Guid for lower memory usage (4 bytes opposed to 16 bytes), this is useful if you have a huge number of items which you need to store. You can use it in the following way :

// murmur hashed guid keys
var db = new RaptorDBGuid("c:\\RaptorDbTest\\hashedguid");  

Global parameters

The following parameters are in the Global.cs file which you can change which control the inner workings of the engine.

Switch over point where duplicates are stored as a WAH bitmap opposed to a list of record numbers
The number of items within a page
Background save index timer seconds ( e.g. save the index to disk every 60 seconds)
Default string key size in bytes (stored as UTF8)
Flush to storage file immediately
Compress and free bitmap index memory on saves

RaptorDB interface

Set(T, byte[])Set Key and byte array Value, returns void
Set(T, string)Set Key and string Value, returns void
Get(T, out string) Get the Key and put it in the string output parameter, returns true if key was found
Get(T, out byte[]) Get the Key and put it in the byte array output parameter, returns true if key was found
This will remove the key from the index
returns all the contents of the main storage file as an
                KeyValuePair<T, byte[]> >
Enumerate the Index from the key given.
GetDuplicates(T) returns a list of main storage file record numbers as an IEnumerable<int> of the duplicate key specified
FetchRecord(int)returns the Value from the main storage file as byte[], used with
and Enumerate
Count(includeDuplicates)returns the number of items in the database index , counting the duplicates also if specified
SaveIndex()Allows the immediate save to disk of the index (the engine will automatically save in the background on a timer)
Shutdown()This will close all files and stop the engine.

Non-clean shutdowns

In the event of a non clean shutdown RaptorDB will automatically rebuild the index from the last indexed item to the last inserted item in the storage file. This feature also enables you to delete the mgidx file and have RaptorDB rebuild the index from scratch.

Removing Keys

In v2 of RaptorDB removing keys has been added with the following caveats :

  • Data is not deleted from the storage file.
  • A special delete record is added to the storage file for tracking deletes and which also help with index rebuilding when needed.
  • Data is removed from the index.

Unit Tests


The following unit tests are included in the source code (the output folder for all the tests is C:\RaptorDbTest ):

  • Duplicates_Set_and_Get : This test will generate 100 duplicates of 1000 Guids and fetch each one (This tests the WAH bitmap subsystem).
  • Enumerate : This test will generate 100,001 Guids and enumerate the index from a predetermined Guid and show the result count (the count will differ between runs).
  • Multithread_test : This test will create 2 threads inserting 1,000,000 items and a third thread reading 2,000,000 items with a delay of 5 seconds from the start of insert.
  • One_Million_Set_Get : This test will insert 1,000,000 items and read 1,000,000 items.
  • One_Million_Set_Shutdown_Get : This test will do the above but shutdown and restart before reading.
  • RaptorDBString_test : This test will create 100,000 1kb string keys and read them from the index.
  • Ten_Million_Optimized_GUID : This test will use the RaptorDBGuid class which will MurMur hash 10,000,000 Guids writting and reading them.
  • Ten_Million_Set_Get : The same as 1 million test but with 10 million items.
  • Twenty_Million_Optimized_GUID : The same as 10 million test but with 20 million items.
  • Twenty_Million_Set_Get : The same as 1 million test but with 20 million items.
  • StringKeyTest : A test for normal string keys of max 255 length.

File Formats

File Format : *.mgdat

Values are stored in the following structure on disk:

File Format : *.mgbmp

Bitmap indexes are stored in the following format on disk :
The bitmap row is variable in length and will be reused if the new data fits in the record size on disk, if not another record will be created. For this reason a periodic index compaction might be needed to remove unused records left from previous updates.

File Format : *.mgidx

The MGIndex index is saved in the following format as shown below:

File Format : *.mgbmr , *.mgrec

Rec file is a series of long values written to disk with no special formatting. These values map the record number to an offset in the BITMAP index file and DOCS storage file.


  • Initial Release v2.0 : 19th January 2012
  • Update v2.1 : 26th January 2012
    • lock on safedictionary iterator set, Thanks to igalk474
    • string default(T) -> "" instead of null, Thanks to Ole Thrane for finding it
    • mgindex string firstkey null fix
    • added test for normal string keys
    • fixed the link to the v1 article
  • Update v2.2 : 8th February 2012
    • bug fix removekey, Thanks to syro_pro
    • removed un-needed initialization in safedictionary, Thanks to Paulo Zemek


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Mehdi Gholam
Architect -
United Kingdom United Kingdom
Mehdi first started programming when he was 8 on BBC+128k machine in 6512 processor language, after various hardware and software changes he eventually came across .net and c# which he has been using since v1.0.
He is formally educated as a system analyst Industrial engineer, but his programming passion continues.

* Mehdi is the 5th person to get 6 out of 7 Platinum's on Code-Project (13th Jan'12)
* Mehdi is the 3rd person to get 7 out of 7 Platinum's on Code-Project (26th Aug'16)

You may also be interested in...

Comments and Discussions

Discussions posted for the Published version of this article. Posting a message here will take you to the publicly available article in order to continue your conversation in public.
QuestionHow to Get Last Pin
aswanee2-Feb-17 1:40
memberaswanee2-Feb-17 1:40 
AnswerRe: How to Get Last Pin
Mehdi Gholam2-Feb-17 3:27
memberMehdi Gholam2-Feb-17 3:27 
Question.NET Core implementation Pin
upmnemam1-Nov-16 12:49
memberupmnemam1-Nov-16 12:49 
AnswerRe: .NET Core implementation Pin
Mehdi Gholam1-Nov-16 18:40
memberMehdi Gholam1-Nov-16 18:40 
QuestionPHP libs Pin
Member 1246889918-Apr-16 13:12
memberMember 1246889918-Apr-16 13:12 
AnswerRe: PHP libs Pin
Mehdi Gholam18-Apr-16 19:14
memberMehdi Gholam18-Apr-16 19:14 
GeneralRe: PHP libs Pin
Member 1246889919-Apr-16 2:13
memberMember 1246889919-Apr-16 2:13 
QuestionRaptorDB.Entity Pin
RoarkDude12-Apr-16 11:03
memberRoarkDude12-Apr-16 11:03 
AnswerRe: RaptorDB.Entity Pin
Mehdi Gholam12-Apr-16 18:45
memberMehdi Gholam12-Apr-16 18:45 
QuestionThe Code Project Open License (CPOL) 1.02 Pin
Miguel Gavinhos14-Jun-15 2:59
memberMiguel Gavinhos14-Jun-15 2:59 
AnswerRe: The Code Project Open License (CPOL) 1.02 Pin
Mehdi Gholam14-Jun-15 5:22
memberMehdi Gholam14-Jun-15 5:22 
QuestionRemove Keys that StartWith Pin
RoarkDude4-Feb-15 21:47
memberRoarkDude4-Feb-15 21:47 
AnswerRe: Remove Keys that StartWith Pin
Mehdi Gholam4-Feb-15 22:13
memberMehdi Gholam4-Feb-15 22:13 
QuestionDateTime Index Last Updated Pin
RoarkDude2-Feb-15 20:44
memberRoarkDude2-Feb-15 20:44 
AnswerRe: DateTime Index Last Updated Pin
Mehdi Gholam2-Feb-15 20:58
memberMehdi Gholam2-Feb-15 20:58 
QuestionCan I update the value of an existing record? Pin
dwcar28-Jan-15 14:43
memberdwcar28-Jan-15 14:43 
AnswerRe: Can I update the value of an existing record? Pin
Mehdi Gholam28-Jan-15 18:25
memberMehdi Gholam28-Jan-15 18:25 
QuestionQuestion regarding "cold hit" performance Pin
Peter Hoogers_14-Oct-14 7:56
memberPeter Hoogers_14-Oct-14 7:56 
AnswerRe: Question regarding "cold hit" performance Pin
Mehdi Gholam14-Oct-14 8:13
mvpMehdi Gholam14-Oct-14 8:13 
GeneralRe: Question regarding "cold hit" performance Pin
Peter Hoogers14-Oct-14 8:24
memberPeter Hoogers14-Oct-14 8:24 
GeneralRe: Question regarding "cold hit" performance Pin
Mehdi Gholam15-Oct-14 22:02
mvpMehdi Gholam15-Oct-14 22:02 
GeneralRe: Question regarding "cold hit" performance Pin
Peter Hoogers16-Oct-14 0:39
memberPeter Hoogers16-Oct-14 0:39 
GeneralRe: Question regarding "cold hit" performance Pin
Mehdi Gholam16-Oct-14 1:52
mvpMehdi Gholam16-Oct-14 1:52 
Question743MB to store one string... static & dynamic? Pin
gpww8-Oct-14 0:23
membergpww8-Oct-14 0:23 
AnswerRe: 743MB to store one string... static & dynamic? Pin
Mehdi Gholam8-Oct-14 2:25
mvpMehdi Gholam8-Oct-14 2:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170424.1 | Last Updated 8 Feb 2012
Article Copyright 2012 by Mehdi Gholam
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid