Click here to Skip to main content
Click here to Skip to main content

RaptorDB - The Key Value Store V2

By , 19 Dec 2012
 
raptordbkv2/raptorDB.png

Introduction

This article is the version 2 of my previous article found here (http://www.codeproject.com/Articles/190504/RaptorDB), I had to write a new article because in this version I completely redesigned and re-architected the original and so it would not go with the previous article. In this version I have done away with the b+tree and hash index in favor of my own MGIndex structure which for all intents and purposes is superior and the performance numbers speak for themselves.

What is RaptorDB?

Here is a brief overview of all the terms used to describe RaptorDB:

  • Embedded: You can use RaptorDB inside your application as you would any other DLL, and you don't need to install services or run external programs.
  • NoSQL: A grass roots movement to replace relational databases with more relevant and specialized storage systems to the application in question. These systems are usually designed for performance.
  • Persisted: Any changes made are stored on hard disk, so you never lose data on power outages or crashes.
  • Dictionary: A key/value storage system much like the implementation in .NET.
  • MurMurHash: A non cryptographic hash function created by Austin Appleby in 2008 (http://en.wikipedia.org/wiki/MurmurHash).

Features

RaptorDB has the following features :
  • Very fast performance (typically 2x the insert and 4x the read performance of RaptorDB v1)
  • Extremely small foot print at ~50kb.
  • No dependencies.
  • Multi-Threaded support for read and writes.
  • Data pages are separate from the main tree structure, so can be freed from memory if needed, and loaded on demand.
  • Automatic index file recovery on non-clean shutdowns.
  • String Keys are UTF8 encoded and limited to 60 bytes if not specified otherwise (maximum is 255 chars).
  • Support for long string Keys with the RaptorDBString class.
  • Duplicate keys are stored as a WAH Bitmap Index for optimal storage and speed in access.
  • Two mode of operation Flush immediate and Deferred ( the latter being faster at the expense of the risk of non-clean shutdown data loss).
  • Enumerate the index is supported.
  • Enumerate the Storage file is supported.
  • Remove Key is supported.

Why another data structure?

There is always room for improvement, and the ever need for faster systems compels us to create new methods of doing things. MGindex is no exception to this rule. Currently MGindex outperforms b+tree by a factor of 15x on writes and 21x on reads, while keeping the main feature of disk friendliness of a b+tree structure.

The problem with a b+tree

Theoretically a b+tree is O(N log k N) or log base k of N, now for the typical values of k which are above 200 for example the b+tree should outperform any binary tree because it will use less operations. However I have found the following problems which hinder performance :

  • Pages in a b+tree are usually implemented as a list or array of child pointers and so while finding and inserting a value is a O(log k) operation the process actually has to move children around in the array or list, and so is time consuming.
  • Splitting a page in b+tree has to fix parent nodes and children so effectively will lock the tree for the duration, so parallel updates are very very difficult and have spawned a lot of research articles.

Requirements of a good index structure

So what makes a good index structure, here are what I consider essential features of one:

  • Page-able data structure:
    • Easy loading and saving to disk.
    • Free memory on memory constraints.
    • On-demand loading for optimal memory usage.
  • Very fast insert and retrieve.
  • Multi-thread-able and parallel-able usage.
  • Pages should be linked together so you can do range queries by going to the next page easily.

The MGIndex

MGIndex takes the best features of a b+tree and improves upon on them at the same time removing the impediments. MGIndex is also extremely simple in design as the following diagram shows:
raptordbkv2/pagelist.png

As you can see the page list is a sorted dictionary of first keys from each page along with associated page number and page items count. A page is a dictionary of key and record number pairs.
This format ensures a semi sorted key list, in that within a page the data is not sorted but pages are in sort order relative to each other. So a look-up for a key just compares the first keys in the page list to find the page required and gets the key from the page's dictionary.

MGIndex is O(log M)+O(1), M being N / PageItemCount [PageItemCount = 10000 in the Globals class]. This means that you do a binary search in the page list in log M time and get the value in O(1) time within a page.

RaptorDB starts off by loading the page list and it is good to go from there and pages are loaded on demand, based on usage.

Page Splits

In the event of page getting full and reaching the PageItemCount, MGIndex will sort the keys in the page's dictionary and split the data in two pages ( similar to a b+tree split) and update the page list by adding the new page and changing the first keys needed. This will ensure the sorted page progression.

Interestingly the processor architecture plays an important role here as you can see in the performance tests as it is directly related to the sorting key time, the Core iX processors seem to be very good in this regard.

Interesting side effects of MGIndex

Here are some interesting side effects of MGIndex
  • Because the data pages are separate from the Page List structure, implementing locking is easy and isolated within a page and not the whole index, not so for normal trees.
  • Splitting a page when full is simple and does not require a tree traversal for node overflow checking as in a b+tree.
  • Main page list updates are infrequent and hence the locking of the main page list structure does not impact performance.
  • The above make the MGIndex a really good candidate for parallel updates.

The road not taken / the road taken and doubled back!

Originally I used a AATree found here (http://demakov.com/snippets/aatree.html) for the page structures, for being extremely good and simple structure to understand. After testing and comparing to the internal .net SortedDictionary (which is a Red-Black tree structure) it was slower and so scrapped (see the performance comparisons).

I decided against using SortedDictionary for the pages as it was slower than a normal Dictionary and for the purpose of a key value store the sorted-ness was not need and could be handled in other ways. You can switch to the SortedDictionary in the code at any time if you wish and it makes no difference to the overall code other than you can remove the sorting in the page splits.

I also tried an assorted number of sorting routines like double pivot quick sort, timsort, insertion sort and found that they all were slower than the internal .net quicksort routine in my tests.

Performance Tests

In this version I have compiled a list of computers which I have tested on and below is the results.

raptordbkv2/computers.png
As you can see you get a very noticeable performance boost with the new Intel Core iX processors.

Comparing B+tree and MGIndex

For a measure of relative performance of a b+tree, Red/Black tree and MGIndex I have compiled the following results.

raptordbkv2/compare.png
Times are in seconds.

B+Tree : is the index code from RaptorDB v1
SortedDictionary
: is the internal .net implementation which is said to be a Red/Black tree.

Really big data sets!

To really put the engine under pressure I did the following tests on huge data sets (times are in seconds, memory is in Gb) :

raptordbkv2/bigdata.png
These tests were done on a HP ML120G6 system with 12Gb Ram, 10k raid disk drives running Windows 2008 Server R2 64 bit. For a measure of relative performance to RaptorDb v1 I have included a 20 million test with that engine also.

I deferred from testing the get test over 100 million record as it would require a huge array in memory to store the Guid keys for finding later, that is why there is a NT (not tested) in the table.

Interestingly the read performance is relatively linear.

Index parameter tuning

To get the most out of RaptorDB you can tune some parameters specific to your hardware.

  • PageItemCount : controls the size of each page.
Here are some of my results:

raptordbkv2/nodes.png
I have chosen the 10000 number as a good case in both read and writes, you are welcome to tinker with this on your own systems and see what works better for you.

Performance Tests v2.3

In v2.3 a single simple change of converting internal classes to structs rendered huge performance improvements of 2x+ and at least 30% lower memory usage. You are pretty much guaranteed to get 100k+ insert performance on any system.

raptordbkv2/v23perf.png

Some of the test above were run 3 times because the computers were being used at the time (not cold booted for the tests) so the initial results were off. The HP G4 laptop is just astonishing.

I also re-ran the 100 million test on the last server in the above list and here is the results:

raptordbkv2/100m.png

As you can see in the above test, the insert time is 4x faster (although the computer specs to not match the HP system tested earlier) and incredibly the memory usage is half than the previous test.

Using the Code

To create or open a database you use the following code :

// to create a db for guid keys without allowing duplicates
var guiddb = RaptorDB.RaptorDB<Guid>.Open("c:\\RaptorDbTest\\multithread", false);

// to create a db for string keys with a length of 100 characters (UTF8) allowing duplicates
var strdb = RaptorDB.RaptorDB<string>.Open("c:\\intdb", 100, true);       

To insert and retrieve data you use the following code :

Guid g = Guid.NewGuid();
guiddb.Set(g, "somevalue");

string outstr="";
if(guiddb.Get(g, out outstr)) 
{
   // success outstr should be "somevalue"
}

The UnitTests project contains working example codes for different use cases so you can refer to it for more samples.

Differences to v1

The following are a list of differences in v2 opposed to v1 of RaptorDB:

  • Log Files have been removed and are not needed anymore as the MGIndex is fast enough for in-process indexing.
  • Threads have been replaced by timers.
  • The index will be saved to disk in the background without blocking the engine process.
  • Messy generic code has been simplified and the need for a RDBDataType has been removed, you can use normal int, long, string and Guid data types.
  • RemoveKey has been added.

Other than that existing code should compile as is with the new engine.

Using RaptorDBString and RaptorDBGuid

RaptorDBString is for long string keys (larger than 255 characters) and it is really useful for file paths etc. You can use it in the following way :

// long string keys without case sensitivity
var rap = new RaptorDBString(@"c:\raptordbtest\longstringkey", false);

// murmur hashed guid keys
var db = new RaptorDBGuid("c:\\RaptorDbTest\\hashedguid");  

RaptorDBGuid is a special engine which will MurMur2 hash the input Guid for lower memory usage (4 bytes opposed to 16 bytes), this is useful if you have a huge number of items which you need to store. You can use it in the following way :

// murmur hashed guid keys
var db = new RaptorDBGuid("c:\\RaptorDbTest\\hashedguid");  

Global parameters

The following parameters are in the Global.cs file which you can change which control the inner workings of the engine.

Parameter
Default
Description
BitmapOffsetSwitchOverCount
10
Switch over point where duplicates are stored as a WAH bitmap opposed to a list of record numbers
PageItemCount
10,000
The number of items within a page
SaveTimerSeconds
60
Background save index timer seconds ( e.g. save the index to disk every 60 seconds)
DefaultStringKeySize
60
Default string key size in bytes (stored as UTF8)
FlushStorageFileImmetiatley
false
Flush to storage file immediately
FreeBitmapMemoryOnSave
false
Compress and free bitmap index memory on saves

RaptorDB interface

Set(T, byte[]) Set Key and byte array Value, returns void
Set(T, string) Set Key and string Value, returns void
Get(T, out string) Get the Key and put it in the string output parameter, returns true if key was found
Get(T, out byte[]) Get the Key and put it in the byte array output parameter, returns true if key was found
RemoveKey(T)
This will remove the key from the index
EnumerateStorageFile() returns all the contents of the main storage file as an IEnumerable< KeyValuePair<T, byte[]> >
Enumerate(fromkey)
Enumerate the Index from the key given.
GetDuplicates(T) returns a list of main storage file record numbers as an IEnumerable<int> of the duplicate key specified
FetchRecord(int) returns the Value from the main storage file as byte[], used with GetDuplicates and Enumerate
Count(includeDuplicates) returns the number of items in the database index , counting the duplicates also if specified
SaveIndex() Allows the immediate save to disk of the index (the engine will automatically save in the background on a timer)
Shutdown() This will close all files and stop the engine.

Non-clean shutdowns

In the event of a non clean shutdown RaptorDB will automatically rebuild the index from the last indexed item to the last inserted item in the storage file. This feature also enables you to delete the mgidx file and have RaptorDB rebuild the index from scratch.

Removing Keys

In v2 of RaptorDB removing keys has been added with the following caveats :

  • Data is not deleted from the storage file.
  • A special delete record is added to the storage file for tracking deletes and which also help with index rebuilding when needed.
  • Data is removed from the index.

Unit Tests

raptordbkv2/unittests.png

The following unit tests are included in the source code (the output folder for all the tests is C:\RaptorDbTest ):

  • Duplicates_Set_and_Get : This test will generate 100 duplicates of 1000 Guids and fetch each one (This tests the WAH bitmap subsystem).
  • Enumerate : This test will generate 100,001 Guids and enumerate the index from a predetermined Guid and show the result count (the count will differ between runs).
  • Multithread_test : This test will create 2 threads inserting 1,000,000 items and a third thread reading 2,000,000 items with a delay of 5 seconds from the start of insert.
  • One_Million_Set_Get : This test will insert 1,000,000 items and read 1,000,000 items.
  • One_Million_Set_Shutdown_Get : This test will do the above but shutdown and restart before reading.
  • RaptorDBString_test : This test will create 100,000 1kb string keys and read them from the index.
  • Ten_Million_Optimized_GUID : This test will use the RaptorDBGuid class which will MurMur hash 10,000,000 Guids writting and reading them.
  • Ten_Million_Set_Get : The same as 1 million test but with 10 million items.
  • Twenty_Million_Optimized_GUID : The same as 10 million test but with 20 million items.
  • Twenty_Million_Set_Get : The same as 1 million test but with 20 million items.
  • StringKeyTest : A test for normal string keys of max 255 length.
  • RemoveKeyTest :  A test for removing keys works properly between shutdowns.

File Formats

File Format : *.mgdat

Values are stored in the following structure on disk:
raptordbkv2/docs_file.png

File Format : *.mgbmp

Bitmap indexes are stored in the following format on disk :
raptordbkv2/mgbmp.png
The bitmap row is variable in length and will be reused if the new data fits in the record size on disk, if not another record will be created. For this reason a periodic index compaction might be needed to remove unused records left from previous updates.

File Format : *.mgidx

The MGIndex index is saved in the following format as shown below:
raptordbkv2/mgidx.png

File Format : *.mgbmr , *.mgrec

Rec file is a series of long values written to disk with no special formatting. These values map the record number to an offset in the BITMAP index file and DOCS storage file.

History

  • Initial Release v2.0 : 19th January 2012
  • Update v2.1 : 26th January 2012
    • lock on safedictionary iterator set, Thanks to igalk474
    • string default(T) -> "" instead of null, Thanks to Ole Thrane for finding it
    • mgindex string firstkey null fix
    • added test for normal string keys
    • fixed the link to the v1 article
  • Update v2.2 : 8th February 2012
    • bug fix removekey, Thanks to syro_pro
    • removed un-needed initialization in safedictionary, Thanks to Paulo Zemek
  • Update v2.3 : 1st March 2012
    • changed internal classes to structs (2x+ speed, 30% less memory)
    • added keystore class and code refactoring 
    • added a v2.3 performance section to the article
  • Update v2.4 : 7th March 2012
    • bug fix remove key set page isDirty -> Thanks to Martin van der Geer
    • Page<T> is a class again to fix keeping it's state
    • added RemoveKeyTest unit test
    • removed MemoryStream from StorageFile.CreateRowHeader for speed
    • current record number is also set in the bitmap index for duplicates
  • Update v2.5 : 28th May 2012
    • added SafeSortedList for access concurrency of the page list
    • insert performance back to v2.3 speed (removed extra writing to duplicates)
  • Update v2.6 : 20th December 2012
    • post back code from RaptorDB the doc store
    • added more data types (uint,short,double,float,datetime...)
    • added locks to the indexfile
    • updated logger
    • updated safe dictionary with locks
    • changed to Path.DirectorySeparatorChar for Mono/MonoDroid support
    • bug fix edge case in WAHbitarray
    • updated storage file

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Mehdi Gholam
Architect
United Kingdom United Kingdom
Member
Mehdi first started programming when he was 8 on BBC+128k machine in 6512 processor language, after various hardware and software changes he eventually came across .net and c# which he has been using since v1.0.
He is formally educated as a system analyst Industrial engineer, but his programming passion continues.
 
* Mehdi is the 5th person to get 6 out of 7 Platinums on CodeProject (13th Jan'12)

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
AnswerRe: "RaptorDB-The Key Value Store V2" and " RaptorDB - the Document Store"mvpMehdi Gholam11 Jan '13 - 0:37 
One is a key value store like a dictionary of things, the other is a document store much like a "normal" database engine, the articles on both should be explanatory.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionWinRT and/or Windows PhonememberJamie Nordmeyer31 Dec '12 - 5:44 
I'm interested in trying this DB out; it looks like it's well implemented, and well supported. However, I'm ideally trying to find a solution that will work across a variety of .NET platforms, specifically Windows, Windows RT, and Windows Phone. Does RaptorDB support usage on Windows RT and/or Windows Phone? I'm not seeing any documentation that says it does, but figured it couldn't hurt to ask. Smile | :) If not, do you plan to add this support? Regardless, nice job!
Jamie Nordmeyer
Portland, Oregon, USA

AnswerRe: WinRT and/or Windows PhonemvpMehdi Gholam31 Dec '12 - 20:10 
Thanks Jamie!
 
I haven't really tested on WinRT/Phone yet, but probably it should be fine and work.
 
I don't have the devices to test on so it would not be on my todo list in the near future.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

Questionrebuilding index every timememberdannystommen22 Dec '12 - 5:02 
Hi Mehdi,
 
Really great piece of code! I have a situation where I'm currently using a Dictionary with 10 milion items in it. The problem with this is the population of the Dictionary (can take a while).
 
RaptorDB could be a great to solve this start up time. However, everytime I open the DB, it's going to rebuild all indexes (even if I'm only using it for reading). With this 10 milion items, it takes about an minute.
 
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| Current Count = 9,999,872
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| Checking Index state...
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| Rebuilding index...
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| last index count = 0
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| data items count = 9999872
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| 100,000 items re-indexed
2012-12-22 04:47:18|DEBUG|10|RaptorDB.KeyStore`1|| 100,000 items re-indexed
 
Is there anything that I can do to prevent this?
 
Thanks,
Danny
AnswerRe: rebuilding index every timemvpMehdi Gholam22 Dec '12 - 5:07 
Thanks Danny!
 
Make sure the engine shutdowns cleanly, and the rebuilding should not kick in to correct things.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: rebuilding index every timememberdannystommen22 Dec '12 - 6:55 
Ah, I missed the Shutdown method. That works, thanks!
Questionforeach enumerationmemberupmnemam13 Dec '12 - 9:27 
How we can make foreach enumeration on all keys?
 
foreach(var key in storage){
 
var itemContent = raptor.Get(key)....
//do something with value
//delete key
 
}
 
or
 
foreach(var item in storage){
 
var itemContent = item.Value;
//do something with value
//delete key
}
 
I want to store logs in raptorDB and later with Quartz to check if there is any keys and if any, copy to original DB.
Still amater

AnswerRe: foreach enumerationmemberupmnemam13 Dec '12 - 11:34 
Can you please also tell us how to construct our schema? Currently im testing as static variable
 
public static RaptorDB.RaptorDB<int> raptor = RaptorDB.RaptorDB<int>.Open(raptorPath, 200, false);
 
and in every function i use this static variable
 
 
public bool Something(){
 
    string some = string.empty;
    raptor.get(1,out some); 
}
 

But i noticed that i cant remove key and also everytime i check for keys count, it getting duplicated like 2,4,8,16...
 

I have two functions, one with just Set() and another one with Get() and RemoveKey()
Still amater

GeneralRe: foreach enumerationmemberupmnemam13 Dec '12 - 15:22 
here are some screenshots
 
EnumerateStorageFile= http://prntscr.com/m4l39.
totalCount = http://prntscr.com/m4l8g
 
removeKey - dont work
Count - looks like give me *2 of total
Still amater

GeneralRe: foreach enumerationmvpMehdi Gholam13 Dec '12 - 19:17 
You should use the Enumerate() method instead of EnumerateStorageFile().
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: foreach enumerationmvpMehdi Gholam13 Dec '12 - 19:15 
Count() currently returns the storage file contents and RaptorDB does not delete data when you call RemoveKey(), so a "deleted" record is added to the storage file hence you get the sequence you referred to.
 
I'm changing the Count() method to use the index file instead so you will get a more accurate result.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

AnswerRe: foreach enumerationmvpMehdi Gholam13 Dec '12 - 19:10 
To enumerate you can use the Enumerate(T fromkey) method and supply a starting point.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionSafeDictionary Helper.CompareMemCmpmemberMohammad Akbar10 Dec '12 - 3:31 
Hi,
 
Thank you for this amazing library, I'm currently porting it to Windows CE (certainly will let you know how it helps us)
 
While reading the code of Helper.CompareMemCmp in SafeDictionary I've noticed that the return value of this function may not be the result of a general comparison function. As an example the following compare will return true, which is obviously not right. Can you confirm whether it is a bug or a particular optimization in your code based on usage pattern ?
 
byte[] left = new byte[] {1, 2, 3, 4, 5};  
byte[] right = new byte[] {1, 2, 3};
int res = Helper.CompareMemCmp(left, right);
 
res will be 0 while I expected it to be +1 !
 
Regards,


AnswerRe: SafeDictionary Helper.CompareMemCmpmvpMehdi Gholam10 Dec '12 - 5:04 
Interesting... the function used is p-invoking the windows api.
 
I will run some tests on it and try to get a valid response from it.
 
Thanks!
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionMax keymemberMember 78300038 Dec '12 - 6:11 
Excellent project. But I have one question. What is the easiest way to get maximum key from RaptorDB?
AnswerRe: Max keymvpMehdi Gholam8 Dec '12 - 7:16 
Interesting question...
 
There is no method to do this efficiently at the moment I will try to put it in.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: Max keymemberfapfip8 Dec '12 - 7:44 
Btw. Is raptorDB manage database file space effectively? I ask because I want use raptorDB to implement simple embedded persistent queue. Do you have in plan implement shrink-like or compact-like methods to prevent grow up database file infinitely - is this methods will be necessary?
GeneralRe: Max keymvpMehdi Gholam8 Dec '12 - 19:31 
I will probably add a "compacting" feature in the future.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionRaptorDB with MONO on LinuxmemberTuxx200026 Nov '12 - 5:13 
I want to use RaptorDB with MONO on Linux. It works prefect with only minimal changes.
 
The only thing is to replace the hard-coded path-seperator "\\" with the .NET path field
"Path.DirectorySeparatorChar" in the following files.
 
BitmapIndex.cs
MGIndex.cs
KeyStore.cs
mylogger.cs
 
So in a few minutes the RaptorDB is cross-platform ready with MONO. Maybe you can change this in the next version of RaptorDB.
 
br,
Alex
AnswerRe: RaptorDB with MONO on LinuxmvpMehdi Gholam26 Nov '12 - 5:18 
Yes, that is about it although in the Doc version I had to change codedom code to Reflection.Emit also (I was really chuffed when it worked on my Asus TF700).
 
I will update the KV version soon as there a couple of enhancements and fixes for it.
 
Thanks Alex!
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

SuggestionTest for existence of Key in indexmemberahazelwood7 Nov '12 - 11:00 
I have a scenario where I just need to test for the existence of a key in the index. Here is an addition to the KeyStore.cs file to just return if a key exists.
 
public bool Exists( T key ) {
    lock ( _lock ) {
        // search index
        int off;
        return _index.Get( key, out off );
    }
}
 
Thoughts?
GeneralRe: Test for existence of Key in indexmvpMehdi Gholam7 Nov '12 - 20:47 
Absolutely spot on!
 
I will add this in the next release.
 
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.
QuestionMultiple processes concurrencymembertompaolo24 Oct '12 - 3:08 
Can more processes (on the same host) open and read/write the same database?
Thanks.
AnswerRe: Multiple processes concurrencymvpMehdi Gholam24 Oct '12 - 4:29 
No, the database files can only be opened by one process. You can however have a service like host which you can connect to via TCP and have multiple processes work with (this is implemented in the document version but not here).
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionPerformancememberupmnemam18 Oct '12 - 21:29 
Why max cpu usage is 25% even on read and write as exe.
 
I tested on SSD, Raid5 and RamDisk, all benchmarks have similar time.
 
10 millions : ~25 sec set, ~45 sec get
 
Isn't it supposed to use max cpu and why there is no difference on ssd, raid5 and ramdisk?
 
Cpu is I5 2500, Quad core 3.3 Ghz
Raptor 2.5 version
Still amater

AnswerRe: PerformancemvpMehdi Gholam18 Oct '12 - 21:53 
RaptorDB has been designed to be sensitive to your CPU and mostly your RAM speed not the storage speed.
 
You will get better performance mostly if you upgrade your memory say from DDR2 to DDR3 etc.
 
The storage speed only comes into play when the engine is storing to disk and this is mainly done in the background so you wouldn't notice (it will become markedly apparent when you shutdown).
 
This would also be dependent on the amount of data you are storing (in the main storage file) so in the test program we are storing small ~30 bytes per record so the OS and harddisk caches also smooth this out.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: Performancememberupmnemam20 Oct '12 - 19:02 
I made simple test vs MongoDB, the performance difference is very huge. On 1 million Guid test, time is like 10x faster on write and 12x faster on read.
 
I made the test not with normal collection, but with capped collection as in memory fastest MongoDB collection. From my experience,
 
What are your future plans about Raptor KV DB?
Still amater

GeneralRe: PerformancemvpMehdi Gholam20 Oct '12 - 19:23 
Which was faster?
 
The KV's big brother is "RaptorDB the Document Store" version, so most of my efforts are going there.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: Performancememberupmnemam21 Oct '12 - 7:17 
RaptorDb was faster.
 
I`m interested for RaptorDB as output caching provider. The only big problem i see here is that RaptorDB dont have LRU algorithm. With LRU algorithm we go step forward of thinking about consistent hashing client and distributed output caching provider.
 
My question about scalability was more like this one.
Still amater

GeneralRe: PerformancemvpMehdi Gholam21 Oct '12 - 8:18 
Wow! I never tested MongoDb like this!
 
LRU is on my todo list mostly for memory usage limiting, currently I have no design (minimalist way to do it) for this so I can't say when it will be implemented.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionIncorrect shutdownmemberwalenru2 Oct '12 - 21:29 
Hi, It's great fast database. We want to use your db on medium loaded service (5000 requests per minute)
 
We've configured to save index database every 30 seconds. After simulated incorrect shutdown
we noticed that indexes are being rebuilded from the begining and it tooks a lot of time.
 
1. May be there is opportunity to rebuild for only new data?
2. Sometimes system hangs on rebuilding indexes on start for long time without any text in log. With last record
 
2012-10-03 03:22:26|DEBUG|32|RaptorDB.KeyStore`1|| Checking Index state...
2012-10-03 03:22:26|DEBUG|32|RaptorDB.KeyStore`1|| Rebuilding index...
2012-10-03 03:22:26|DEBUG|32|RaptorDB.KeyStore`1||    last index count = 0
2012-10-03 03:22:26|DEBUG|32|RaptorDB.KeyStore`1||    data items count = 2510698
2012-10-03 03:22:26|DEBUG|32|RaptorDB.KeyStore`1|| 100,000 items re-indexed
 
Can we do something with that problems?
AnswerRe: Incorrect shutdownmvpMehdi Gholam2 Oct '12 - 22:17 
Currently there is no way of doing this as you can't tell where to start from and the index does not keep a previous state and the pages could be corrupted in any order.
 
Hopefully non-clean shutdowns should not happen that often.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

BugCrashes...memberSh'Man27 Jul '12 - 8:34 
Am I doing something wrong? It crashes on EnumerateStorageFile
 
RaptorDB<int> db1 = RaptorDB<int>.Open("c:\\raptordbtest\\t1", false);
RaptorDB<int> db2 = RaptorDB<int>.Open("c:\\raptordbtest\\t2", true);
 
db1.Set(1, "s1");
db1.Set(2, "s2");
db1.Set(3, "s3");
 
db2.Set(2, "log2lab2: str1");
db2.Set(2, "log2lab2: str2");
 
db2.Set(3, "log3lab3: str1");
db2.Set(3, "log3lab3: str2");
 
db2.Set(1, "log1lab1: str1");
db2.Set(1, "log1lab1: str2");
 
foreach (var pair in db2.EnumerateStorageFile())
{
    listBox1.Items.Add(pair.Key.ToString());
}
 
db1.Shutdown();
db2.Shutdown();

GeneralRe: Crashes...mvpMehdi Gholam27 Jul '12 - 10:20 
What is the error message?
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: Crashes...memberSh'Man27 Jul '12 - 18:25 
RaptorDB 2.5
file: SafeDictionary.cs
Line: 160
fixed (byte* numRef = &(value[startIndex]))
Message: Index was outside the bounds of the array.
 
By the way...
If I call method Shutdown() and then reopen database, everything works fine
Questionnot thread safemembermapageweb23 Jul '12 - 5:41 
Hi, sorry for my bad english.
 
First, thank you for this work.
 
The function FetchRecordString and RemoveKey are not thread safe. If I add a lock, the problem is resolve.
 

An another scenario seem to be problematic:
I add 100 item, iterate with a for, call FetchRecordString, remove the key. After the iteration, I save the index and I ask for the Count() method and I receive a count of 200 ! How it's possible ? I remove all so the count suppose to be 0.
AnswerRe: not thread safemvpMehdi Gholam23 Jul '12 - 6:06 
Thanks!
 
I will add the locks that are missing.
 
Count() is using the storage file (100 recs + 100 delete recs), it should be using the index file for counting items.
 
I will change this in the next release.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

QuestionRaptorDB: MGIndex internals?memberzabrane23 Jun '12 - 12:04 
Hi Mehdi,
 
First of all, thousand thanks for RaptorDB.
I just finished reading the 3 posts about it.
 
I would like to learn some MGIndex internals especially, how inserts works.
Even if the design is simple, I still can't see how the inserts are performed.
 
1. Lets say I have a Key "Hello" associated with a value "World".
Could you please describe the steps to insert it in an MGIndex?
 
2. Suppose now the <"Hello", "Word"> record is stored. How the "search" is performed (i.e the steps)?
 
2. How the sorted list of pages is created at the beginning? Is it empty an empty list at t=0?
How are you performing the sort of this list?
 
3. The benchmarks are very impressive by the way. Can RaptorDB goes beyond the 100m of records (say 1 billion or more)?
 
4. What MGIndex stands for Smile | :) ?
 
I really appreciate your help make me understand the MGIndex design architecture.
 
Keep up the good job Thumbs Up | :thumbsup:
 
Regards
Zab
AnswerRe: RaptorDB: MGIndex internals?mvpMehdi Gholam23 Jun '12 - 18:24 
Thanks Zab!
 
1) - find the page in the page list to insert to by comparing the key and the pages first key
- add the key and value to the dictionary of the found page
 
2) - find the page in the page list for the search key
- get the value from the dictionary in the page found
 
3) The only limitation at the moment is the amount of memory you have (I had 12gb at the time of testing), it may be possible to go higher on limited ram with a more intelligent caching scheme that flushes out least recently used pages in ram, but it is probably cheaper to get more ram instead.
 
4) Smile | :)
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: RaptorDB: MGIndex internals?memberzabrane23 Jun '12 - 22:20 
Hi Mehdi,
 
Much clear now.
 
But when you said compare the "key" (in my case a string "Hello"), should I compare it as it it or should
I use its hash instead (i.e MurmurHash2("Hello")) ?
 
You didn't answer yet my question about the "sorted list of pages" at t=0 Big Grin | :-D ?
 
Thx
Zab
GeneralRe: RaptorDB: MGIndex internals?mvpMehdi Gholam23 Jun '12 - 23:03 
Hashing will be done transparently if you are using the RaptorDBString or RaptorDBGuid so you don't need to.
 
The page list is automatically sorted (SortedList) and updated when a page is full (=10000 items) which will sort the keys and split the page into 2 pages.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

Questioncouple of questionsmemberjez3217 Jun '12 - 22:37 
First, thanks very much for work!
Second, I have very little knowledge of document oriented databases so sorry if I'm asking some obvious questions but -
 
1) How do we use the full text search feature? I get putting the [FullText] attribute on the appropriate field, is there anything else that needs to be done? And when searching is it done through the same db.Query() function?
 
2) Also, how would you achieve something like paging over a table of data? Is it possible to for e.g specify just to retrieve the top 25 results, or results 26-50 etc?
 
thanks again,
 
Jeremy
AnswerRe: couple of questionsmvpMehdi Gholam7 Jun '12 - 23:24 
Thanks Jeremy!
 
This is the Key Value store forum.
 
A1) Yes that's is all you need to do, the query will do full text searches e.g. Address = "street" will find and address with the word street in it.
 
A2) Paging is not supported at the moment, it will come later.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: couple of questionsmemberjez32110 Jun '12 - 21:40 
Ahh sorry meant to post to the document store forum.
Anyway thanks alot for the reply, got the full text search going and will implement paging when it's released
GeneralRe: couple of questionsmvpMehdi Gholam10 Jun '12 - 21:43 
Paging will probably be in v1.6, hopefully Smile | :)
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

SuggestionSuggestionsmemberchprogmer29 May '12 - 4:28 
Hello
 
Thank you for your updates. That's nice to see this project is alive.
 
Can I add some suggestions ?
 
. EnumerateStorageFile should not collect the duplicates, or maybe only as an option.
(because it does even when I open the database with AllowDuplicateKeys=false).
 
. A new function EnumerateKeys to obtain the keys only.
Values are sometimes useless for a filter. It is for performance.
 
. Publish enumerate functions on RaptorDBString, not only on KeyStore.
 
. Add some informations to KeyValuePair: IsADuplicate, IsDeleted, etc..
 
. Add the standard documentation to the functions and classes (with ///).
It is nice to have some details directly inside Visual Studio (with F12).
 
. KeyStore.Set should replace ( = delete) the previous value.
As of v2.5, I obtain duplicates when I use EnumerateStorageFile.
 
. A new function: SetIfDifferent to Set only when the bytes are different to a previous value.
Because AFAIK Raptor always add a new key&value, which can causes a lot of duplicates. BTW, I noted it is true even when the size of the new value is equal to the size of the old value. Why not use the old location in the file ?
I know this function would have poor performance, but it can be a choice, depending on what is in the database.
 
Thank you, and have a nice day. Smile | :)
GeneralRe: SuggestionsmvpMehdi Gholam29 May '12 - 5:20 
EnumerateStorageFile is not meant to be used in that way (it uses the storage file and not the index so that is why you will get the duplicates), EnumerateKeys is probably a better solution.
 
I will try to fit these suggestion in.
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

GeneralRe: Suggestionsmemberchprogmer29 May '12 - 5:55 
Thank you.
 
I believe an EnumerateKeys for RaptorDBString will be useful for many applications.
QuestionVery NicememberMike Hankey28 May '12 - 7:05 
Good job Mehdi
VS2010/Atmel Studio 6.0 ToDo Manager Extension
Version 3.0 now available.
There is no place like 127.0.0.1

AnswerRe: Very NicemvpMehdi Gholam28 May '12 - 7:23 
Thanks Mike!
Its the man, not the machine - Chuck Yeager
If at first you don't succeed... get a better publicist
If the final destination is death, then we should enjoy every second of the journey.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 20 Dec 2012
Article Copyright 2012 by Mehdi Gholam
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid