Introduction
One of the irritations in working with data-intensive applications is that you almost have to use relational databases to store your data. There is little alternative to them because they are so efficient in storing and retrieving data. The downside is that the developer needs to deal with a lot of programming overhead to cross the impedance mismatch between objects and data. OODBMS have never been a very successful alternative and they are just as expensive and just as heavy.
Wouldn't it be nice if there was a fast, lightweight data store? One that keeps all the objects you need in memory (after all, memory is cheap) and which stores them away and provides you with the same ACID properties as a RDBMS does?
Well, there is. It's called Prevalence, and the .NET implementation is called Bamboo. This article demonstrates how to move your RDBMS over to a Bamboo-style implementation using the XSD Migration Toolkit.
Background - What is Prevalence
The concept of prevalence was invented by Klaus Wuestefeld. His concept is simple:
- All objects are stored in memory, because memory is cheap.
- At regular intervals (or when shutting down) the objects are serialized to disk, called a snapshot.
- All manipulations to the object is recorded in a commandlog, which is written to disk immediately.
These three concepts are very simple and yet powerful. Memory is cheap and 1 Gigabyte of RAM is not uncommon nor expensive these days. Many (yes not all, but probably 80% or more) databases do not use this much data.
Writing the snapshot to disk is quick and hardly causes overhead amongst all the disk I/O the O/S already performs.
Writing the commandlog to disk immediately gives us the ability to recover. If the application falls over it is simply restarted, then the snapshot and the outstanding commandlog are read in and applied.
The result is impressive. The Prevalence Wiki at http://www.prevayler.org/wiki.jsp has run comparisons with MySQL and Oracle and found 3000 and 9000 times faster performance respectively. But in most cases that is not what counts. What is really important is that one can work with proper entity objects instead of tables/records and flywheel patterns. This makes the implementation lightweight from the development perspective as well.
The C# implementation of prevalence is called Bamboo and can be found on Sourceforge at http://sourceforge.net/projects/bbooprevalence/.
The XSD Migration Kit (aka Bamboo Builder)
One of the issues I had with Bamboo was that it took a greenfields view: simply write a few objects and away you go. In reality, however, I had all my applications on databases and I was really interested in moving them over. So I developed the XSD Migration kit. Its purpose is quite simple:
- Read an XSD file of a
DataSet
generated in VS.NET.
- For each table write a business class.
- Write a
RootObject
to manage access to all the entities.
You might ask, why bother, because XSD.EXE already has an option to generate classes from XSD files. Well, that's how I started. But then I found there was still so much to do. The classes simply are a collection of public instance variables in them. Not at all what I would regard as good practice. I wanted to have features such as:
- All table fields are encapsulated as properties.
- For distributed applications I needed a concept to detect update collisions.
- I needed a way to manage foreign keys.
- Primary keys should be protected against inadvertent change.
Then the instances need to be collected somewhere so you can find them. This is where the root object comes in. In short it has the following responsibilities:
- It contains all instances of business entities (aka records) that are managed under prevalence.
- It can be called to insert, retrieve, update or delete entities.
- It has functions to retrieve entities with a particular foreign keys.
- Support entities with multi-field primary keys.
- Acts as a single entry-point for calls to change the data so that Prevalence can quietly wrap around it and do its magic.
Those were the minimum features. In addition, I wanted more (who doesn't). This is what I came up with in the end:
- Stronlgy typed collections and dictionaries for each entity (I figured while I was generating entitites I should go for the whole hog).
- Support for
IEditableObject
, as I also wanted to have it easier using them with forms.
- Remoting (Well actually, the particular implementation of Bamboo already does this - so I can't claim it here).
- Object-object relations; get away from foreign keys
The last item looked like a natural to be included in the first list. And so it was - until I implemented a remoted application. Forget about the issue of retrieving by value an object that has relations with other instances. That I could live with. But if you do update this object and change a relationship to another object and the call the Update method on the server something horrible happens. The server receives the updated object (which physically is different from the one it has in the store) plus all the related objects which all are again physically different from what is in the store. The server now has to go, update the stored object by inspecting all relations stored and comparing them with the one received from the update call. That is a nightmare.
So I decided to stay with foreign keys. I called it loosely coupled relationships. It makes life a lot easier.
So I didn't implement the last item. However I am considering generating autoretriving properties that will simply call back to the server.
Before I forget - there was one more feature: generating an automated loader class that can take the original DataSet
, filled from the database and create the objects and store them in the Bamboo store!
Using the Sample
This article has several components:
- BambooBuilder.exe: the application to generate the class files.
- A simple XSD file called SimpleIssueTrack.xsd.
- The generated class files.
- A simple application skeleton to load and access the objects.
A Brief Tour of the Generated Classes
The XSD defines several tables
- Issue
- IssueArea
- Person
- ActionItem
- CodeSet
- CodeSetValue
BambooBuilder reads this file and generates a class file for each table. It also generates a root class called IssueTrackRoot
and a loader class called IssueTrackLoader
.
Let's take a look at the generated Issue.cs:
namespace IssueTrackerLib
{
using System;
using System.Collections;
using Bamboo.Prevalence;
[Serializable()]
public class Issue : System.ComponentModel.IEditableObject {
The points of interest is that all classes must be attributed with the SerializableAttribute
. The IEditableObject
interface specification is purely optional and its generation can be turned off in Bamboo Builder.
At the beginning are all the fields listed as private variables:
internal int _updateCount = 0;
internal int _IssueID;
private int _AssignedToPersonId;
private System.Decimal _Cost;
private System.DateTime _DateCreated;
private string _Description;
The first variable actually does not represent a field. Its purpose is to help with optimistic concurrency control - more about that later. The next ones are the field names with an underbar in front. How private variables are marked is a matter of taste.
As a point of interest, the first variable _IssueID
is marked internal. It is a primary key hence it is not supposed to be changed by the application. However it needs to be set (by the root class) with an appropriate primary key. As it is an int
, the Bamboo Builder has generated code to make it an auto-numbering key.
This is what the Insert method of the root class looks like:
public virtual Issue Insert(Issue newIssue)
{
if ((newIssue == null))
{
throw new System.ArgumentNullException("newIssue",
"Can't Insert a null object");
}
newIssue._IssueID = _nextIssueID;
_nextIssueID = (_nextIssueID + 1);
_Issues.Add(newIssue.KeyObject, newIssue);
return newIssue;
}
The Insert()
method simply keeps track of all the nextids. The root object has one Insert()
method for each entity. This makes it easy as there is no InsertIssue
, InsertPerson
, or InsertActionItem
.
Primary keys are another issue as soon as multiple field keys appear. The CodeSetValue
has two keys:
CodeSetId
which is also a foreign key to CodeSet.
- And
Value
which is a string.
To handle this in a common way, all entities have a method called KeyObject
which returns an Object
. In the case of the Issue
class it is its IssueID
. In the case of the CodeSetValue
the KeyObject
is generated thus
public virtual object KeyObject
{
get { return new _KeyObject(CodeSetId, Value); }
}
The class _KeyObject
is internal to CodeSetId
. It implements an object that overrides the comparing method Equals()
to allow determining In/Equality between instances. Here is the implementation of the class IssueArea._KeyObject
.
[Serializable]
public class _KeyObject {
private int _CodeSetId;
private string _Value;
public _KeyObject(int CodeSetId, string Value) {
_CodeSetId = CodeSetId;
_Value = Value;
}
public override bool Equals(Object obj) {
if (this.GetType() != obj.GetType()) return false;
_KeyObject kObj = (_KeyObject) obj;
return ((_CodeSetId == kObj._CodeSetId)
&& (_Value == kObj._Value));
}
public override int GetHashCode() {
return _CodeSetId.GetHashCode() ^ _Value.GetHashCode();
}
}
Because _KeyObject
overrides Equals()
it also must override GetHashCode()
.
How to use the classes
To write an application you need to create some code, but a lot less than before.
Startup - create the prevalence engine
When initialising you have to create an instance of th prevalence engine and the instance of the Root Object:
string prevalenceBase = Path.Combine(Environment.CurrentDirectory, "data");
PrevalenceEngine _engine = PrevalenceActivator.CreateTransparentEngine(
typeof(IssueTrackerLib.IssueTrackRoot), prevalenceBase);
IssueTrackerLib.IssueTrackRoot Root = _engine.PrevalentSystem
as IssueTrackerLib.IssueTrackRoot ;
The first statement simply creates a directory for the data files. The second statement activates the prevalence engine. The third provides an instance of the root class.
Creating a new Issue
is now straightforward:
Issue newIssue = new Issue(IssueAreaId, .....);
Root.Insert(Issue);
Now the Insert
method, as you saw above, simply sets the primary key and stores it in its Dictionary of Issues. What about the commandlog? Earlier on I stated that the Prevalence system would record all transactions in a command which would be logged to file so that in case of a crash the state of the datastore could be restored. This is where .NET reflection comes in. If you look at the class definition of IssueTrackRoot
you will see the following:
[Serializable]
public class IssueTrackRoot : System.MarshalByRefObject {
The class IssueTrackRoot
inherits from MarshalByRefObject
. The prevalence engine uses that to insert proxy calls. So when you call Insert(newIssue)
the call is intercepted by the Preference engine and serialised to the log file before executed.
Getting and updating an object is similarly simple:
Issue anIssue = Root.GetIssue(id);
anIssue.Status="Closed";
Root.Update(anIssue);
The class Issue
has a foreign key to IssueArea
- if you want to get all issues for an existing IssueArea
do this:
IssueArea anIssueArea = Root.GetIssueArea(id);
IssueCollection relatedIssues = Root.GetIssuesAllFor(anIssueArea);
There you are. I haven't touched on IEditableObject
but take a look on that topic in the .NET help.
Points of Interest
The ZIP file also includes an application that reads an Access file and generates a data file. This has been taken out of a larger build environment and hence you need to adapt database paths and references.
If you want more detail and keep an eye on future versions: look at http://users.bigpond.net.au/Meyn/Bamboo.
Many thanks to Peter Bromberg from Egghead Cafe (http://www.eggheadcafe.com) for motivating me to move Bamboo Builder from a buggy V 0.2 Alpha to a fairly stable V0.4 Beta.
History
2003-10-31
I am a Software Engineer/Consultant. My work is focussed on helping teams to get more out of their work. So I teach how to do requirements, analysis and design in a format that is easy to understand and apply.
I help with testing too, from starting developers on automated unit testing to running whole testing teams and how they cooperate with development.
For really big projects I provide complete methodologies that support all of the lifecycle.
For relaxation I paddle a sea kayak around Sydney and the Central Coast or write utilities on rainy days to make my life easier.