Click here to Skip to main content
14,689,512 members
Articles » Database » Database » Other databases
Article
Posted 1 Jun 2009

Stats

265.3K views
9.6K downloads
344 bookmarked

Building an embedded database engine in C#

Rate me:
Please Sign up or sign in to vote.
4.91/5 (109 votes)
10 Jun 2009CPOL
DbfDotNet is a very fast and compact fully managed standalone database/entity framework, for the .Net Framework.
Image 1

Introduction

This article present a standalone fully managed database/entity engine which implements fixed width record tables and BTree indexes. 

The latest source is available in CodePlex: http://dbfdotnet.codeplex.com/ 

I am welcoming anyone wanting to contribute to this project. 

Why an embedded database 

Although most of us will use a SQL Server to store and retrieve data sets.
There are several situation where an embedded database make sense.

  • When you don't have a SQL Server available
  • When you want your footprint as small as possible and can't afford SQL Express
  • When you want to manipulate or cache SQL data
  • When you need to write highly procedural data manipulation routines
  • When you want maximum speed

Features

Despite its small size DbfDotNet provides a number of features that you might find useful

  • Type safe

In DbfDotNet you manipulate classes with native field types. All data conversion plumbing is done automatically.

  • Very simple entity framework

Creating a record and accessing its propery is only what you need.

  • Very small memory footprint

Last time I checked the dbfDotNet dll was 50Kb. Other databases are 1Mb to 10Mb.

I would appreciate if someone could do some memory usage comparison (I will insert it here).

  • Fast

DbfDotNet was conceived for speed.

DbfDotNet do not use PInvoke, Threading locks, and do not implement any transaction system.
Those 3 technologies have a performance cost that it won't have to pay.

In contrast it is using TypeSafe records (without boxing/unboxing) and type safe emitted code. The code is emitted only once per table.

It has therefore I believe the potential to be the fastest embedded .Net database there is.

I would appreciate if someone could do some speed comparison (I will insert it here).

  • Very small runtime memory usage

When you use in Memory DataTable or SQL requests that return DataSets, the entire result sets is in memory.

DbfDotNet works conjointly with the garbage collector. As soon as you're finished modifying an entity the garbage collector will mark the record buffer to be saved to disk and released from memory.

Why Dbf

By default the files are compatible with dBase and can therefore be open in Excel and many other packages.

I have been asked : Why Dbf ? Dbf is an old format.

The answer is a bit long but simple.

As I said earlier DbfDotNet is designed to be as fast as possible.

In order to get the database started and get some interest I need two things:

  • A good product
  • A good user base

I know by experience that the DBF format will appeal to some of you for several reason:

  • You can easily backup DBF files (and leave index files)
  • You can check DBF content using Excel and many other tools
  • DBF is well known and simple to implement
  • It can be extended to modern types (and has been by clipper and fox pro)

Most importantly for me, implementing the .DBF rather that my own custom format has no impact on runtime speed.

How does it compare to ADO.Net, SQL, SqlLite, SharpSQL ... 

I did some speed test against another database (which I won't name) 

The results are quite encouraging. 

 Dbf.Net ADO.Net 
Opening DbfDotNetDatabase: 185 ms
Insert 1000 individuals: 39 ms
Read individuals sequentially: 5 ms
Read individual randomly: 3 ms
Modifying individuals: 21 ms
Create DateOfBirth index: 77 ms
     Michael Simmons 22/07/1909
     Mark Adams 21/09/1909
     Charles Edwards 28/09/1909
     ... total 1000 records
Enumerate Individuals by age: 36 ms
Closing DbfDotNetDatabase: 44 ms
Opening ADO.Net Database: 459 ms
Insert 1000 individuals: 80601 ms
Read individuals sequentially: 1655 ms
Read individual randomly: 1666 ms
Modifying individuals: 75574 ms
Create DateOfBirth index: 80 ms
     Michael Simmons 22/07/1909
     Mark Adams 21/09/1909
     Charles Edwards 28/09/1909
     ... total 1000 records
Enumerate Individuals by age: 29 ms
Closing ADO.Net Database: 0 ms

In this test Dbf.Net runs nearly 400 times faster. This is quite unfair however. Dbf.Net does not have transactions and is not ACID. 

Lets not focus to much on speed but more on code differences: 

Creating a Table 

Creating the table is quite different. Dbf.Net requires a type safe record upfront to create a table.  In ADO.Net you provide a string. 

 

 Dbf.Net ADO.Net 
DbfTable<dbfdotnetindividual> mIndividuals;

void CreateIndividualTable()
{
  mIndividuals = 
    new DbfTable<dbfdotnetindividual>(
      @"individuals.dbf", 
      Encoding.ASCII, 
      DbfDotNet.DbfVersion.dBaseIV);
}


class Individual
 : DbfDotNet.DbfRecord, IIndividual
 {
  [DbfDotNet.Column(Width = 20)]
  public string FIRSTNAME;
  [DbfDotNet.Column(Width = 20)]
  public string MIDDLENAME;
  [DbfDotNet.Column(Width = 20)]
  public string LASTNAME;
  public DateTime DOB;
  [DbfDotNet.Column(Width = 20)]
  public string STATE;
 }
Connection _cnn = null;


void ITestDatabase.CreateIndividualTable()
{
  _cnn = new System.Data.Connection(
"Data Source=adoNetTest.db");
  _cnn.Open();
  using (DbCommand cmd = _cnn.CreateCommand())
 {
   cmd.CommandText = "CREATE TABLE 
     INDIVIDUAL (ID int primary key, 
     FIRSTNAME VARCHAR(20), 
     MIDDLENAME VARCHAR(20), 
     LASTNAME VARCHAR(20), 
     DOB DATE, 
     STATE VARCHAR(20))";

    cmd.ExecuteNonQuery();
  }
}

Inserting new entries in a table: 

Inserting entries differ again, in ADO you have to build a command string. In DbfDotNet you simply call the NewRecord() method and set the fields. Dbf.Net automatically uses the class you have provided to create the table. Calling the SaveChanges() is not mandatory but useful if you want your controls to refresh instantly. 

 Dbf.Net ADO.Net 
void InsertNewIndividual(
   int id, 
   string firstname,
   string middlename,
   string lastname,
   DateTime dob,
   string state)
{
  var indiv = mIndividuals.NewRecord();
  indiv.FIRSTNAME = firstname;
  indiv.MIDDLENAME = middlename;
  indiv.LASTNAME = lastname;
  indiv.DOB = dob;
  indiv.STATE = state;
  indiv.SaveChanges();
}
void InsertNewIndividual(
  int id, 
  string firstname, 
  string middlename, 
  string lastname,
  DateTime dob, 
  string state)
{
 using (DbCommand cmd =
   _cnn.CreateCommand())
 {
  cmd.CommandText = string.Format(
   "INSERT INTO INDIVIDUAL (ID,
    FIRSTNAME, MIDDLENAME, LASTNAME, 
    DOB, STATE) VALUES({0},
    '{1}', '{2}', '{3}', 
    '{4}', '{5}');",
   id, firstname, middlename,
   lastname,
   dob.ToString("yyyy-MM-dd HH:mm:ss"),
   state);
  cmd.ExecuteNonQuery();
 }
}

Getting an individual by record ID  

Getting a Individual record differs again, in ADO.Net you have to build a command string. In Dbf.Net you call a method. Also Dbf.Net automatically uses the class you have provided to create the table. Are you seeing a pattern emerging here? 

 Dbf.Net ADO.Net 
IIndividual GetIndividualById(int id)
{
  DbfDotNetIndividual result =
    mIndividuals.GetRecord(id);
    return result;
}
IIndividual GetIndividualById(int id)
{
 using (DbCommand cmd =
   _cnn.CreateCommand())
 {
  cmd.CommandText = string.Format(
    "SELECT * FROM INDIVIDUAL
     WHERE ID=" + id);
  var reader = cmd.ExecuteReader();
  try
  {
   if (reader.Read())
    return GetNewIndividual(reader);
   else return null;
  }
  finally
  {
   reader.Close();
  }
 }
}

Individual GetNewIndividual(
DbDataReader reader)
{
 var res = new Individual();
 res.ID = reader.GetInt32(0);
 res.FirstName = reader.GetString(1); 
 res.MiddleName = reader.GetString(2);
 res.LastName = reader.GetString(3);
 res.Dob = reader.GetDateTime(4);
 res.State = reader.GetString(5);
 return res;
}

 class Individual : IIndividual
 {
  public int ID { get; set; }
  public string FirstName { get; set; }
  public string MiddleName { get; set; }
  public string LastName { get; set; }
  public DateTime Dob { get; set; }
  public string State { get; set; }
 }

Saving a modified individual back to the database.

In Dbf.Net you don't have to write any code, if you don't want to wait for the garbage collector to collect your individual you can call SaveChanges

 Dbf.Net  ADO.Net 
void SaveIndividual(
  Individual individual)
{
  individual.SaveChanges();
}
void SaveIndividual(
  IIndividual individual)
{
 using (DbCommand cmd =
   _cnn.CreateCommand())
 {
  cmd.CommandText = string.Format(
    "UPDATE INDIVIDUAL
 SET DOB='{1}' WHERE ID={0};",
 individual.ID,
 individual.Dob.ToString(
   "yyyy-MM-dd HH:mm:ss"));
  cmd.ExecuteNonQuery();
 }
}

Creating an Index 

In ADO.Net you have to build a command string. In Dbf.Net you call a method. 
Despite the AddField("DOB") not looking type safe, it is internally emitting code and perfectly type safe. 
 Dbf.Net  ADO.Net 
void CreateDobIndex()
{
  var sortOrder = 
    new DbfDotNet.SortOrder<Individual>(
    /*unique*/false);
    sortOrder.AddField("DOB");
    mDobIndex = mIndividuals.GetIndex(
    "DOB.NDX", sortOrder);
}

I wish I could write sortOrder.AddField(DOB) but it wouldn't work. Anyone got an idea about this?

void CreateDobIndex()
{
 using (DbCommand cmd =
   _cnn.CreateCommand())
 {
  cmd.CommandText =
   string.Format(
   "CREATE INDEX DOB_IDX ON 
    INDIVIDUAL (DOB)");
  cmd.ExecuteNonQuery();
 }
}

Getting individuals sorted by Age 

Using the index is simple, no need to make a 'SELECT' command, just use foreach on the index.
 Dbf.Net  ADO.Net 
IEnumerable<Individual>
  IndividualsByAge()
{
  foreach (Individual indiv
    in mDobIndex)
  {
    yield return indiv;
  }
}
IEnumerable<Individual> 
  IndividualsByAge()
{
 using (DbCommand cmd =
   _cnn.CreateCommand())
 {
  cmd.CommandText = string.Format(
    "SELECT * FROM INDIVIDUAL
    ORDER BY DOB");
  var reader = cmd.ExecuteReader();
  try
  {
   while (reader.Read())
   {
    yield return 
      GetNewIndividual(reader);
   }
  }
  finally
  {
   reader.Close();
  }
 }
}

As you can see the code is generally much shorter with DbfDotNet. 

I tried to drive away from having to provide a commands in a string.  

On the contrary I tried to make it use type safe members and overall more object oriented. 

High Level Interface 

I have been asked how I compare to other SQL databases.

Again DbfDotNet is not a SQL engine.

It is rather an object persistence framework, like the Microsoft Entity Framework or NHibernate.

The difference is that it doesn't translate object manipulations into SQL requests because it speaks directly to the database layer.

I would love to write a proper Dbf to Linq interface, if you want to help me on this please volunteer. 

The difference  

Using the code 

Warning: This project is at its infancy, it has not been tested thoroughly. 

You can try it but please don't use it in a live environment. 

If you want speed however and are ready to either report or fix issues that might arrise: 

  1. Create a C# project
  2. Reference DbfDotNet.dll in your project
  3. Create a record class
  4. Write some code manipulate the records

Point 3 and 4 are expanded below.

The DbfRecord class

The DbfRecord class represent one row in your table.

You can can the column attribute to change DBF specific parameters.

class Individual : DbfDotNet.DbfRecord
{
    [Column(Width = 20)]        public string FIRSTNAME;
    [Column(Width = 20)]        public string MIDDLENAME;
    [Column(Width = 20)]        public string LASTNAME;
    public DateTime DOB;
    [Column(Width = 20)]        public string STATE;
}

The system automatically chooses the DbfField most appropriate for your datatype.

The DbfTable class

In order to store your records somewhere you need to create a Table:

individuals = new DbfTable<Individual>(
     @"individuals.dbf",
     Encoding.ASCII,
     DbfVersion.dBaseIV);

Note that this using a type safe template. Every record in the table are individual's.

Record Manipulation

You can add new lines in the table by using the NewRecord

var newIndiv = individuals.NewRecord();

Then you simply use the fields in your record

newIndiv.LASTNAME = "GANAYE";

Optionally you can make a call to SaveChanges to immediately save your changes.
If you don't the data will be saved when your individual is garbage collected.

newIndiv.SaveChanges();

Index support

This is still very basic. First you define your sort order:

var sortOrder = new SortOrder<Individual>(/* unique */ false);
sortOrder.AddField("LASTNAME");

Then you can get your index:

mIndex = individuals.GetIndex("lastname.ndx", sortOrder);

You can then, In a type safe way, retrieve any individual from your index.

individual = mIndex.GetRecord(rowNo);

In order to maximize speed, the index emit its own type safe code for :

  • reading the index fields from the DBF record
  • reading and writing index entries
  • comparing index entries

Inner architecture

DbfDotNet main class is the ClusteredFile

The ClusteredFile is a wrapper around stream that provide paging and caching support.

The ClusteredFile is the base class for DbfFile and NdxFile. It will also be the base class for memo files when I write them.

The ClusteredFile uses a class called QuickSerializer to serialize the record content to a byte array.

QuickSerializer parse the Record fields and generate a bit of IL code for every fields to allow reading, saving and comparison.

NdxFile implements a B+Tree index

Roadmap

My plan is to keep this library extremelly small. It is not my intention to implement any transaction or multi-threading support.

I will implement :

  • support for every DBF fields types
  • memo fields (VARCHAR type)
  • multiple indexes files (*.mdx)
  • Proper documentation
  • LINQ (in a separate dll)

If you want to help me on this project please contact me.

Points of Interest

<pi>

In order to maximize speed I forced myself to not use any thread synchronization locking.

Each set of Dbf + Indexes must be called from a given thread.
In other word each dbf file and its index can be used by only one thread.

I encountered a problem though when the Garbage Collector finalize a record, this is done in the Garbage Collector thread. I did not want to lock a resource and ended up writing this code:

class Record
{
   private RecordHolder mHolder;

   ~Record()
   {
      try
      {
         ...
      }
      finally   
      {
         mHolder.RecordFinalized.Set();
      }
   }
}

Each record has a RecordHolder that store a ReadBuffer and potentially a WriteBuffer.

When the record finalize it signal the RecordHolder that the record has been finalized. This instruction is not blocking, it raises a flag that can be used in other threads.

class ClusteredFile
{
   internal virtual protected Record InternalGetRecord(UInt32 recordNo)
   {
      RecordHolder holder = null;
      if (!mRecordsByRecordNo.TryGetValue(recordNo, out holder)) {...}
      
      record = holder.mRecordWeakRef.Target;
      if (record==null)
      {
         // the object is not accessible it has finalized a while ago or is being finalized 
         if (holder.RecordFinalized.WaitOne())
         {
            //Now it has finalized we will create a new record
            holder.RecordFinalized.Reset();
            holder.Record = OnCreateNewRecord(/*isnew*/false, recordNo);
         }
      }
      return holder.Record;
   }
}

And then when the table thread try to get the record while it is disposing we use the method : holder.RecordFinalized.WaitOne() to make sure the finalization has completed first. Most of the time this method won't be blocking your DBF thread as the record has been finalized some time ago. 

History

2009 June 4th : Added samples and ADO.Net comparison
2009 June 1st : First DbfDotNet (C#) release. 

2000 May 21st : I wrote my first database engine, it is called tDbf and works on Delphi. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Pascal Ganaye
Software Developer (Senior)
France France
I am a French programmer.
These days I spend most of my time with the .NET framework, JavaScript and html.

Comments and Discussions

 
QuestionDeleting a row Pin
MiAnjoanina22-Oct-19 7:28
MemberMiAnjoanina22-Oct-19 7:28 
SuggestionDELETED_FLAG encoding Pin
JSikocinski8-Jan-18 9:05
MemberJSikocinski8-Jan-18 9:05 
GeneralRe: DELETED_FLAG encoding Pin
MiAnjoanina22-Oct-19 7:26
MemberMiAnjoanina22-Oct-19 7:26 
SuggestionDbfVersion encoding Pin
JSikocinski8-Jan-18 9:00
MemberJSikocinski8-Jan-18 9:00 
PraiseExcelente aporte Pin
Member 1195375415-Sep-17 6:46
MemberMember 1195375415-Sep-17 6:46 
QuestionSecurity - Remote Connection Pin
masteripper24-Oct-16 2:38
Membermasteripper24-Oct-16 2:38 
QuestionEh??? Pin
Member 43662204-Jul-16 1:46
MemberMember 43662204-Jul-16 1:46 
AnswerRe: Eh??? Pin
Pascal Ganaye29-Jul-16 5:37
MemberPascal Ganaye29-Jul-16 5:37 
You're message is quite upsetting.
I put quite a bit of effort into writing this article.
The library described is frankly quite small and not full fledge but it has its benefits.


> You are generating database tables with SQL strings in code and even populating querying created tables with sql strings.

I do.

> For a production database you would just not do this, You would plan and create the database first using the database management application

I would not nescessarily no.
It really depends on the context.

This library let you dynamically create DBF files. This is quite convenient to export your data into something that excel and many other tools will understand well.
It does mean that you know the database structure upfront. You could for example let the user of an application choose what column they want to export and then create a dbf files with it. You can see that in this scenario you would find hard tu use a database management application.

Also more largely there a growing movement in database development called Code-First[^]. In this movement, the database is created completely from code. This is supported by most major databases these days.

You should also look at document databases and no-sql databases. In these scenarios you definitely do not create the database upfront using a database management program.


> and add in stored procedures or queries to return record sets.

This is when your database has stored procedures and queries.
The minimalistic database engine described in this article do not.
Please be aware of the word embedded in the title in a dictionary.
You'll have to explain to me how you install a database that supports stored procs in a 64Kb microcontroller.


> Have you ever worked on a BIG production database ?

As it happens, I have. I do know what a SQL database is because I have written several softwares that managed multi-tera-bytes databases.
A long time ago I passed a master degree with distinction where databases was one the main subject.
This is really one of my favorite subject.


> Clearly a novice!!!
Let me see your profile[^] and mine[^].

I am doing my best trying to write useful articles.
I am not an english native speaker, this exercice is not obvious for me.
This article written 7 years ago, collected a significant number of up votes and was awarded Best C# article of June 2009.
The Code project web site sent me quite a few books as a reward for this article.

Please stop being negative, it will get you nowhere.
Start doing things and you won't find the need to insult random people.
QuestionMy 2 cents on finalizers Pin
mike.janel29-Nov-15 0:39
Membermike.janel29-Nov-15 0:39 
AnswerRe: My 2 cents on finalizers Pin
Pascal Ganaye30-Nov-15 13:26
MemberPascal Ganaye30-Nov-15 13:26 
QuestionHow to create and use decimal * currency * values ? Pin
Alexandre Bencz5-Jan-14 19:23
MemberAlexandre Bencz5-Jan-14 19:23 
AnswerRe: How to create and use decimal * currency * values ? Pin
Jhollman16-Dec-14 4:08
MemberJhollman16-Dec-14 4:08 
GeneralRe: How to create and use decimal * currency * values ? Pin
Alexandre Bencz18-Dec-14 6:52
MemberAlexandre Bencz18-Dec-14 6:52 
QuestionHow to delete one "row" ? Pin
Alexandre Bencz5-Jan-14 16:08
MemberAlexandre Bencz5-Jan-14 16:08 
AnswerRe: How to delete one "row" ? Pin
MiAnjoanina10-Oct-19 8:20
MemberMiAnjoanina10-Oct-19 8:20 
General5 Stare - Thanks for sharing. Pin
Lee Gunn11-Aug-13 2:16
MemberLee Gunn11-Aug-13 2:16 
QuestionI love this DB Engine!!!!!! Pin
Steve_Scott111-Jun-13 21:57
MemberSteve_Scott111-Jun-13 21:57 
QuestionDbf don't save the data Pin
Alexandre Bencz5-Dec-12 5:23
MemberAlexandre Bencz5-Dec-12 5:23 
QuestionMy vote 5 Pin
bgsjust1-Nov-12 12:51
professionalbgsjust1-Nov-12 12:51 
AnswerRe: My vote 5 Pin
Pascal Ganaye9-Dec-16 11:55
MemberPascal Ganaye9-Dec-16 11:55 
QuestionTest Information. I need to rethink my Post and Read. Compare SQL Server DB with DBF.NET Pin
Member 80150468-Jul-11 14:45
MemberMember 80150468-Jul-11 14:45 
GeneralMy vote of 5 Pin
gar0814-Nov-10 22:49
Membergar0814-Nov-10 22:49 
GeneralProblem with deleted Flag... Pin
jogibear99889-Sep-10 6:17
Memberjogibear99889-Sep-10 6:17 
Questiondid you make the bplustree work? Pin
Huisheng Chen27-Jul-10 2:31
MemberHuisheng Chen27-Jul-10 2:31 
GeneralMy vote of 1 Pin
nethol21-Jul-10 12:39
Membernethol21-Jul-10 12:39 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.