Introduction
This article attempts to highlight the latest developments in both the Mongo open-source document database and the open-source official C# driver and to supplement the previous
reviews on CodeProject in the light of these improvements.
Overview of Document Databases.
Document databases store information relating to a record in a contiguous blob of data known as a document . A document’s structure usually follows the
JSON format and consists of a series of key-value pairs. Unlike the schema of relational databases, the document’s structure does not reference empty fields. This flexible arrangement allows fields to be added and removed with ease. What’s more, there is no need to rummage about in various tables when trying to assemble the data; it’s all there in one solid block.
The downside of all this is that Document databases tend to be meaty. But, now that disk drives are in the bargain basement, the trade off between speed of access and storage costs has shifted in favour of speed and that has given rise to the increased use of document databases. The Large Hadron Collider at Cern uses a document database
but that's not why it keeps breaking down.

Hosted Web Server for MongoDb.
There is free web hosting of MongoDb at
MongoHQ . The sandbox database
plan provides 512mb of storage and is a good way to test drive the database.
There is no need to download the mongoDb binaries and the web site’s user
interface provide allows administrative tasks to be carried out. Just sign up,
down load the
driver and you’re cooking with gas. I’ve used this service, it
seems to be genuinely free and there is no badgering to upgrade.
Desktop Installation of MongoDb and the C# Driver
All you need to get started is on the Mongodb
website. Installation instructions are well documented
although you might have to wade through some detritus to get the correct set for
your system. You also need to download the C# .Net driver as well as the MongoDb
binaries. The C# Driver consists of two libraries: the BSON Library, MongoDB.Bson.dll, and the C# Driver, MongoDB.Driver.dll. There is a basic user interface situated 1000 ports above the port the database listens on.
For the default installation, this is at
http://localhost:28017/. You need to have the line 'Set rest=true' in the mongod.cfg file to enable
this interface. There are also more sophisticated open-source applications available for carrying out administration tasks on the database.
The Database Structure
The basic structure for storing data fields is the BsonElement. It’s a simple
KeyValue pair . The Key contains a field name and the Value its value. The Value can itself be a BsonElement, so they can be nested, Russian doll style. Records are stored as documents. The Document is a collection of BsonElements.
Here is an example document.
{
_id : 50e5c04c0ea09d153c919473,
Age : 43,
Cars : {0:Humber,1: Riley},
Forename : Rhys,
Lastname : Richards
}
Every record does not need to contain every field. The only required field is the _id and fields can be added at a future date without having to change the existing records. In this example, the Cars field is an array. Its
Value field contains a nested Document. The elements in the nested Document are
KeyValue pairs. The key is the array index number and the value is the name of the car.
The C# driver.

The driver is used to interface your code to a Mongo database. The driver can serialize data classes to the database without
the need for special attributes. All that's required is a unique Id. This is usually of type BSON.ObjectId, a 12 byte time-stamped value which is automatically assigned by Mongodb . You can use a GUID instead but it needs to be mapped to a string.
The reason for this is that a GUID is usually stored as a binary and the driver’s Aggregation Framework has problems digesting binary data. I get the same sort of trouble with cucumber sandwiches.
Connecting to the database.
The first requirement is to have a connection string. If you plan to use hosted version, you need to sign up to mongoHQ with a username and password, create a database and register yourself as a new user for the database. Make a note of the login string provided , it will look something like.
const string hostedWebConnectionString =
"mongodb://myUserName:myPassword@linus.mongohq.com:myPortNumber/";
The default connection string for the desktop server is simply mongodb://localhost. Here is the code for accessing a database named test.
const string connectionString = "mongodb://localhost";
var client = new MongoClient(connectionString);
MongoServer server = client.GetServer();
MongoDatabase database = server.GetDatabase("test");
These calls will fail on the hosted site if the test database does not exist or you are not a registered user of the database. This is because new databases must be created in admin mode on the hosted web site. These constraints do not apply to the desktop server -it will go ahead and create a new database called ‘test’ if it does not already exist.
Accessing collections.
Documents with a similar structure are arranged as named collections of data in the database. The driver has a Collection object that acts as a proxy for a database’s collection. The following code shows how to access and enumerate a collection,
named 'entities', of type ClubMember.
MongoCollection<ClubMember> collection = database.GetCollection<ClubMember>("entities");
Console.WriteLine("List of ClubMembers in collection ...");
MongoCursor<ClubMember> members = collection.FindAll();
foreach (ClubMember clubMember in members)
{
clubMember.PrintDetailsToScreen();
}
It’s recommended that the foreach method is used wherever possible as it cleans up after itself. Boring housekeeping duties such as calling AttachDatabase(),DropDatabase() are a thing of the past.
You should avoid calling DropDatabase() as it closes down the database's
connection pool.
Indexes.
MongoDB indexes use a B-tree
data structure. All queries only use one index and a query optimiser chooses the most appropriate index for the task. The following code builds an index to sort data based on the Lastname property then by the Forename sorted A-Z and finally by the Age property, oldest to youngest.
IndexKeysBuilder keys = IndexKeys.Ascending("Lastname", "Forename").Descending("Age");
IndexOptionsBuilder options = IndexOptions.SetName("myIndex");
collection.EnsureIndex(keys, options);
This index is great for searching on Lastname or Lastname, Forename or Lastname, Forename, Age. It is not useful for sorting on Forename or Age or any combination of the two. The default behaviour is for indexes to be updated when the data is saved as this helps to prevent concurrency problems. But there is still a potential problem if newly written data is immediately read back. The way round this is to ensure that the write and read operations are performed on the same thread by enclosing the operations within the following.
using (server.RequestStart(database))
{
}
Querying Data Using Linq.
This is done by referencing the Collection’s AsQueryable method before writing the Linq statements All the usual methods are available. Here are a few examples
var names =
collection.AsQueryable().Where(p => p.Lastname.StartsWith("R") && p.Forename.EndsWith("an")).OrderBy(
p => p.Lastname).ThenBy(p => p.Forename).Select(p => new { p.Forename, p.Lastname });
Console.WriteLine("Members where the Lastname starts with 'R' and the Forename ends with 'an'");
foreach (var name in names)
{
Console.WriteLine(name.Lastname + " " + name.Forename);
}
var regex = new Regex("ar");
Console.WriteLine("List of Lastnames containing the substring 'ar'");
IQueryable<string> regexquery =
collection.AsQueryable().Where(py => regex.IsMatch(py.Lastname)).Select(p => p.Lastname).Distinct();
foreach (string name in regexquery)
{
Console.WriteLine(name);
}
Querying Data Using The QueryBuilder Class.
Using the query builder classes is not as exciting as writing Linq, you don’t get the opportunity to put lots of arrows in your code, but there are still some methods that are worth highlighting.
DateTime membershipDate = DateTime.Now.AddYears(-5);
DateTime membershipDateUTC = membershipDate.ToUniversalTime();
MongoCursor<ClubMember> recentMembers =
collection.Find(Query.GT("MembershipDate", membershipDateUTC));
Console.WriteLine("Members who have joined in the last 5 years ...");
foreach (ClubMember clubMember in recentMembers)
{
clubMember.PrintDetailsToScreen();
}
There are methods to carry out most of the common sorts of comparisons.The Query.And
method does a logical AND on successive Query objects. The next bit of code
illustrates this by finding all members called David Jones and then updating the
Forename to Dai. The Update.Set() method sets the Forename field
to its new value on all documents selected. Finally , Collection.Update performs the update on the server side.
IMongoQuery davidJonesQuery = Query.And(Query.EQ("Lastname",
"Jones"), Query.EQ("Forename", "David"));
UpdateBuilder update = Update.Set("Forename", "Dai");
collection.Update(davidJonesQuery, update, UpdateFlags.Multi);
Querying Data Using Map Reduce.
MapReduce is a heavy-duty method used for batch processing large amounts of data. There are two main parts to it. A map function that associates a field with a value and a reduce function that reduces the input values to a single output. There is an example using Map Reduce in the sample code as it may come in handy but for most users the Aggregation Framework is a better way of collating data.
Querying Data Using The Aggregation Framework.
The Aggregation Framework is used to collect and collate data from various documents in the database. It’s new in version 2.2 and is an attempt to bring the functionality of SQL to a document database.
The aggregation is achieved by passing a collection along a pipeline where various pipeline operations are performed consecutively to produce a result. It’s an oven-ready chicken type production line -there is less product at the end but it is more fit for purpose.
Aggregation is performed by calling the Collection’s Aggregate method with an array of documents that detail various pipeline operations.
Aggregation Example.
In this example there is a document database collection consisting of the members of a vintage car club. Each document is a serialized version of the following ClubMember Class
public class ClubMember
{
#region Public Properties
public int Age { get; set; }
public List <string> Cars { get; set; }
public string Forename { get; set; }
public ObjectId Id { get; set; }
public string Lastname { get; set; }
public DateTime MembershipDate { get; set; }
#endregion
#region Public Methods and Operators
public void PrintDetailsToScreen()
{
Console.WriteLine(String.Format("{0,-12}{1,-10}{2,4}{3,14}",
this.Lastname, this.Forename,
this.Age, this.MembershipDate.ToShortDateString()));
}
#endregion
}
The ClubMember Class has an array named Cars that holds the names of the vintage cars owned by the member. The aim of the aggregation is to produce a list of owners
who have joined in the last five years for each type of car in the collection.
Step 1 Match Operation.
The match operation selects only the members that have joined in the last five years.
Here's the code.
var utcTime5yearsago = DateTime.Now.AddYears(-5).ToUniversalTime();
var matchMembershipDateOperation = new BsonDocument
{
{ "$match", new BsonDocument { { "MembershipDate",
new BsonDocument { { "$gte",utcTime5yearsago } } } } }
};
As you can see, the code ends up with more braces than an orthodontist but at least itelliSense assists when you are writing it. The keyword $gte
indicates a greater than or equal query.
Step 2 Unwind Operation.
Unwind operations modify documents that contain a specified Array. For each element within the array a document identical to the original is created. The value of the array field is then changed to be equal to that of the single element. So a document with the following structure
_id:700,Lastname: “Evans”, Cars[“MG”,”Austin”,Humber”]
Becomes 3 documents
_id:700,Lastname: “Evans”, Cars:“MG”
_id:700,Lastname: “Evans”, Cars:“Austin”
_id:700,Lastname: “Evans”, Cars:“Humber”
If there are two or more identical elements, say Evans has two MGs, then there will be duplicate documents produced.
Unwinding an array makes its members accessible to other aggregation operations.
var unwindCarsOperation = new BsonDocument { { "$unwind", "$Cars" } };
Step3 Group Operation.
Define an operation to group the documents by car type. Each consecutive operation does not act on the original documents but the documents produced by the previous operation. The only fields available are those present as a result of the previous
pipeline operation. You can not go back and pinch a field from the original documents.
The $ sign is used in two ways. Firstly, to indicate a keyword and, secondly, to differentiate field
names from field values. For example, Age is a field name, $Age is the value of the Age
field.
var groupByCarTypeOperation = new BsonDocument
{
{
"$group",
new BsonDocument
{
{ "_id", new BsonDocument { { "Car", "$Cars" } } },
{
"Owners",
new BsonDocument
{
{
"$addToSet",
new BsonDocument
{
{ "_id", "$_id" },
{ "Lastname", "$Lastname" },
{ "Forename", "$Forename" },
{ "Age", "$Age" },
{"MembershipDate","$MembershipDate"}
}
}
}
}
}
}
};
Step 4 Project Operation.
The _id field resulting from the previous operation is a BsonElement consisting of both the field name and its Value. It would be better to drop the field name and just use the Value. The following Project operation does that.
var projectMakeOfCarOperation = new BsonDocument
{
{
"$project", new BsonDocument
{
{ "_id", 0 },
{ "MakeOfCar", "$_id.Car" },
{ "Owners", 1 }
}
}
};
Step 5 Sort Operation.
Define an operation to Sort the documents by car type.
var sortCarsOperation = new BsonDocument { { "$sort", new BsonDocument { { "MakeOfCar", 1 } } } };
The number 1 means perform an ascending sort. A 0 is used to indicate a decending sort
Step 6 Run the Aggregation and output the result.
AggregateResult result = collection.Aggregate(
matchMembershipDateOperation,
unwindCarsOperation,
groupByCarTypeOperation,
projectMakeOfCarOperation,
sortCarsOperation);
The AggregateResult class returned has a bool field named Ok. It is set to true if there were no errors. The resulting documents are returned in the AggregateResult.ResultDocuments collection. The easiest way to deserialize the collection is to call its Select method passing in the Deserialize method of the BsonSerializer as follows.
public class CarStat
{
#region Public Properties
public string MakeOfCar { get; set; }
public BsonDocument[] Owners { get; set; }
#endregion
}
IEnumerable<CarStat> carStats =
result.ResultDocuments.Select(BsonSerializer.Deserialize<CarStat>);
foreach (CarStat stat in carStats)
{
Console.WriteLine("\n\rCar Marque : {0}\n\r", stat.MakeOfCar);
IEnumerable<ClubMember> clubMembers =
stat.Owners.AsEnumerable().Select(BsonSerializer.Deserialize<ClubMember>).OrderBy(p => p.Lastname).
ThenBy(p => p.Forename).ThenBy(p => p.Age).Select(p => p);
foreach (ClubMember member in clubMembers)
{
member.PrintDetailsToScreen();
}
}
The sample application has an aggregation example that performs various calculations on the data set
such as Count, Min, Max and Total.
GridFS.
GridFS is a means of storing and retrieving files that
exceed the BsonDocument size limit of 16MB. Instead of storing a file in a
single document, GridFS divides a file into chunks and stores each of the chunks as a separate document. GridFS uses
two collections to store files. One collection stores the file chunks and the
other stores the file’s metadata. The chunk size is about 256k. The idea here is that smaller chunks of data can be
stored more efficiently and consume less memory when being processed than large files. It’s generally not a good idea to store binary data in the
main document as it takes up space that is best used by more meaningful data.
Uploading data into GridFs is straight forward. Here are a couple of examples.
const string fullyQualifiedUpLoadName = @"C:\temp\mars.png";
MongoGridFSFileInfo gridFsInfo = database.GridFS.Upload(fullyQualifiedUpLoadName);
using (var fs = new FileStream(fullyQualifiedUpLoadName, FileMode.Open))
{
gridFsInfo= database.GridFS.Upload(fs, "mars.png");
}
The GridFS.Upload method returns an object of type MongoGridFSFileInfo. This contains the file’s metadata. Only basic details such as
the file’s name and length are included by default but the metadata can be customised to facilitate searching. Here's how.
BsonDocument photoMetadata = new BsonDocument
{ { "Category", "Astronomy" }, { "SubGroup",
"Planet" }, { "ImageWidth", 640 }, { "ImageHeight", 480 } };
database.GridFS.SetMetadata(gridFsInfo,photoMetadata);
var coll= database.GetCollection("fs.files");
IndexKeysBuilder keys = IndexKeys.Ascending("metadata.Category", "metadata.SubGroup");
coll.EnsureIndex(keys);
var astronomyPics= database.GridFS.Find(Query.EQ("metadata.Category", "Astronomy"));
const string fullyQualifiedDownLoadName = @"C:\temp\mars2.png";
database.GridFS.Download(fullyQualifiedDownLoadName, gridFsInfo);
database.GridFS.Delete(Query.EQ("metadata.Category", "Astronomy"));
MongoDB Replica Sets.
A replica set is a cluster of mongoDB instances that replicate amongst one
another so that they all store the same data. One server is the primary and
receives all the writes from clients. The others are secondary members and
replicate from the primary asynchronously. The clever bit is that, when a
primary goes down, one of the secondary members takes over and becomes the new
primary. This takes place totally transparently to the users and ensures
continuity of service. Replica sets have other advantages in that it is easy to
backup the data and databases with a lot of read requests can reduce the load on
the primary by reading from a secondary. You can not rely on any one instance
being the primary as the primary is determined by members of the replica set at run time.
Installing A Replica set as a Windows Service.
This example installs a replica set consisting of one primary and two secondary instances. The instances will be name MongDB0. MongoDB1, MongoDB2.
They will use IP address localhost and listen on ports 27017, 27018 and 27019
respectively. The replica set name is myReplSet.
Step 1 Housekeeping tasks.
In the mongodb folder add three new folders named rsDataDb0, rsDataDb1, rsDataDb2. These are the data folders.
Remove any instance of mongo that may be already running. In this example the service name to be removed is MongoDB.
Open a command prompt in administrator mode, navigate to where mongod.exe is installed and enter:
mongod.exe --serviceName MongoDB --remove
Step 2 Install three new service instances.
The best way to do this is to have three configuration files, one for each instance . The format of these files is very similar. Here is the congfig file for MongoDB0 . The hash sign comments out a line
#Use this to direct output to a log file instead of the console
#*******************************
logpath=C:\mongodb\log\rsDb0.log
#********************************
logappend = true
journal = true
quiet = true
#Enable this is you wish to use the user interface situated at 1000 ports above the server port
rest=true
#
# The port number the mongod server will listen on
# change port for each server instance
#**************************************
port=27017
#****************************************
# Listen on a specific ip address
# This is needed if running multiple servers.Comment out to access mongod remotely.
bind_ip=127.0.0.1
# This sets the database path, change the database path for each server instance
#**********************************************
dbpath=C:/mongodb/rsDataDb0
#*****************************************
# Keep same replica set for all servers in the set
replSet=myReplSet
The config files are included in the sample code bundle, but, basically, you change the port, dbpath and logpath for each instance. Store the config files in the bin directory and enter the following commands
C:\mongodb\bin\mongod.exe --config C:\mongodb\bin\repSetDb0.cfg --serviceName MongoDB0 --serviceDisplayName MongoDB0 --install
C:\mongodb\bin\mongod.exe --config C:\mongodb\bin\repSetDb1.cfg --serviceName MongoDB1 --serviceDisplayName MongoDB1 --install
C:\mongodb\bin\mongod.exe --config C:\mongodb\bin\repSetDb2.cfg --serviceName MongoDB2 --serviceDisplayName MongoDB2 --install
Check the log files to confirm all is well and enter the following commands to start the services.
net start MongoDB0
net start MongoDB1
net start MongoDB2
Step 3 Configure the Replica Set.
To configure the Replica set you need to use the Mongo shell.
Make sure you are in the \mongodb\bin directory and enter the command
mongo MongoDB0
The shell will connect to the MongoDB0 instance. Now initialise a variable called config by entering the following:
config = { _id : "myReplSet",members : [ {_id : 0, host :"localhost:27017"},
{_id : 1, host : "localhost:27018"}, {_id : 2, host :"localhost:27019"},]}
Pass this variable to the rs.initiate() method by entering the command
rs.initiate(config)
You now have time to put the kettle on while Mongo takes your hard drive for a spin. When the method returns you are ready to go.
You can find out the status of your replica set by entering rs.status() in the mongo shell. To connect to the Replica set with the C# driver use this connection string.
const string connectionString =
"mongodb://localhost/?replicaSet=myReplSet&readPreference=primary";
Conclusion
There is much more to MongoDB than is detailed in this article but the hope is that there is enough information here for you to be able to begin exploring
the capabilities of this open source software. Finally, I’d like to express my gratitude to the many developers who have worked tirelessly on the
MongoDB project with little prospect of reward other than the satisfaction of having helped others. Thanks very much –I take my hat off to you.