Download source files - 257.6 KB

Introduction

This article is intended as part of series of articles which focus on using Oracle's Berkeley XML Database BDB XML for short. BDB XML is available for download under an Open Source license, for internal and personal development. BDB XML is a small footprint option for use with client applications that need a reliable persistence layer. The first part of the series will include instructions for setting up the environment, and some basic concepts, utilizing the command shell. The second part will include a demo application written in C# for .NET framework 2.0 which will use only the BDB XML libraries. The articles are intended to familiarize you with Oracles Berkeley XML DB and provide a basic demo of how to incorporate it into a simple application. The articles are no way representative of the full breadth of features and options available with the Oracle Berkeley XML DB, but are intended to get you up and running. The articles do not attempt to go into detail about XPath, XQuery, FLOWR, etc … or provide examples of best design practices or patterns for creating an application, resources will be provided at the bottom of the page for further study.

Oracle Berkeley DB XML is an Open Source embedded database for storing and retrieving XML documents with indexing capabilities. The product is built on top of the Open Source Berkeley Database. The database runs in-process with your application, and provides a Java and C++ API as part of the download but can be extended with a C# wrapper (provided by Parthenon Computing) which can be used with all CLR compliant languages. The XML documents themselves can be stored as either whole documents or in part by breaking the XML structure down to the node level. BDB XML provides retrieval capability with XPath, XQuery and FLOWR. The article provides XPath examples "XPath is a query language (with some programming language features) that is designed to query collections of XML data. It is semantically similar to SQL." (Wikipedia , http://en.wikipedia.org/wiki/XPath ).

Why not just use XML Documents, or other types of text storage?

Because BDB XML has indexing capabilities, a big plus.
Because BDB XML provides transaction handling capabilities.
Because BDB XML is built using standards that make it more extensible than developing a custom framework.
Because BDB XML can be used as an embedded database and wrapped with an application.
Because BDB XML supports scripting using FLOWR (not discussed in this article) much like typical relational databases.

How might BDB XML be used?

BDB XML might be used for a client application that requires reliable embedded storage.
BDB XML might be used as persistence layer for client server applications (an interface would need to be built to handle requests).
BDB XML might be used as an intermediate storage layer before transactions are sent to a shared relational database, thus reducing the load on the database server.
BDB XML might be used on a web server to retrieve data stored in XML format for AJAX web pages (the documents can be retrieved extremely fast).
Because BDB XML stores XML documents and .NET languages allow for objects to be serialized in XML format, BDB XML can be used to persist objects to be retrieved and de-serialized later. (You can see where this is going!).

Using the code

What you will need to get started for part 1. As mentioned, this is part one of a two part series. For this part you will need the following. The Berkeley XML Database, this article uses the 2.1.8 version (I would suggest using this version despite newer releases as this article assumes this is the version used). This download link is to an msi installer which can be downloaded for free from the URL below. The download is quite large, but if you are writing an application with an embedded persistence layer, you will only need to reference a handful of libraries http://download.oracle.com/berkeley-db/dbxml-2.1.8.msi

Getting started steps

For our example we create a directory called C:\BDBXML to store our database
Once we create our directory we want to use the Berkeley XML DB shell. The installer has already set the environmental variables so once you open a command prompt and change to the C:\BDBXML directory you just created; you can type "dbxml" at the prompt. You can always get help from the shell by typing either "help" or "help + (the command to receive help on)". You will see a screen that looks like the following:
BDB XML uses what is referred to as containers for the highest level of storage; once a container is created you can add documents to it. You can think of a container as a database table (although there are differences). We will create a Customer container by typing the following at the command prompt. (createContainer Customers.dbxml n validate). You should see the following:
We have created a container with node level storage hence the "n" flag, with schema validation enabled hence the "validation" flag. Once this is created you will see a file named Customers.dbxml in your BDBXML directory.
BDB XML allows indexes just like a relation database. Indexes can be used to enhance performance which is outside the scope of this article. We will create an index to serve as a unique primary key index on the CustomerId field (this will be the unique identifier for Customers in the XML documents that we will add to this container). You want to create the indexes before you start adding documents. At the prompt type the following (addIndex "" CustomerId unique-node-element-equality-decimal). You will see the following:
We have specified a new index in the default namespace of our container "" = default, with the following attributes, it will be applied on the CustomerId node with constraint at node level using an XML element to compare decimal type to see if each entry is unique hence the "unique-node-element-equality-decimal flag".

Now it's time to add some data. Just like database tables we would like to constrain our data to fit a specification, or schema. For the example we want a Customer to have a CustomerId that is an integer, not null, a CustomerFName a string, not null, and a CustomerLName a string, not null. We can do this using XML Schemas (See end of article for links to resources on this topic). Below I have defined a schema for the Customer which all of our Customer entries will be validated against. The text of the file is provided below.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="Customers" elementFormDefault="qualified" 
        xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Customers">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Customer" type="Customer" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:complexType name="Customer">
    <xs:sequence>
      <xs:element name="CustomerId" type="xs:int" 
            maxOccurs="1" minOccurs="1" nillable="false">
      </xs:element>
      <xs:element name="CustomerFName" 
    type="xs:string" maxOccurs="1" minOccurs="1" nillable="false"/>
      <xs:element name="CustomerLName" 
    type="xs:string" maxOccurs="1" minOccurs="1" illable="false">
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

I created this schema with Visual Studio 2005 visual designer for XSD documents it is a great time saver.
OK, now it's time to add our first customer. BDB XML stores data in logical containers called documents; think of this as a database record or records. To keep our example simple, we put each Customer in its own XML document. The below document represents our first customer, notice I included the Customers.xsd file as part of the declaration at the top (when no schema is found "it won't be found because we didn't include one" Customers.xsd will be used) hence xsi:noNamespaceSchemaLocation="Customers.xsd". Customers.xsd should be in your root directory in this case C:\BDBXML\Customers.xsd. The text of the file is provided below.
```
<?xml version="1.0" encoding="utf-8"?>
    <Customers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="Customers.xsd">
    <Customer>
            <CustomerId>1</CustomerId>
        <CustomerFName>Jimmy</CustomerFName>
        <CustomerLName>Page</CustomerLName>
    </Customer>
</Customers>
```
Because we specified a "validate" flag when we created the container "createContainer Customers.dbxml n validate", schema validation has been enabled and documents will be validated when they are added as long as you include the correct schema location in all of your Customer XML documents. We type (putDocument customer001 Customers.xml f) at the prompt. Note: the customer001 after the putDocument command uniquely identifies the document to BDB XML, each time you add another document(record/s) you must specify a unique document name such as customer002, customer003 … To save yourself headaches you should adopt a naming convention that uniquely identifies documents such as ContainerName + Unique Key of record (customer001). Pay attention to the "f" after the document location of the putCommand, this is a flag that lets the DB know we are adding an f= file (if it is omitted you will receive an exception) . Again the Customers.xml files should reside in your root C:\BDBXML\Customers.xml. After you type the command you should see the following :
We have added our first record!! Let's see the unique constraint in action. I will attempt to add the document again, this time I will specify customer002 as the file identifier, but I will not change the CustomerId in our Customers.xml file.
BDB XML keeps us from entering two customers with the same CustomerId. We will now change the customer information and add some new customers, I just opened the Customers.xml file and changed the data, then saved it and finally issued the putDocument command . See below:
Notice I had to give each document a new Id customer002, customer003, customer004. If I did not do this I would receive an error specifying that I violated a unique constraint.
Now let's query our entries. One of the ways that you can query data in BDB XML is by using XPath expressions. I have included a link to a resource at the end of the article so that you may familiarize yourself with XPath if you have not done so. We would like to see all the customers we added. To do this we issue the following command "query" with the XPath expression in single quotes (query 'collection("Customers.dbxml")/Customers/*'). Mind the double quotes around the Container name and single quotes around the query expression. You should see the following:
"collection(db container name)" specifies that we may potentially return several records. You will notice 4 objects returned, to see the data for the objects, issue the (print) command. See below.
We can now see the content of the documents we added. Now let's modify some data. We will update the customer with a CustomerId of 1 and change the CustomerFName to Eric. All of your typical data modification can also be achieved with XPath expressions. BDXML requires that you actually load the nodes that you will be working with before you can make modifications. By loading we select the node or nodes we will be working with. I find it best to use the exact same XPath expression in my "query" command to select or load the node, as I do in my "updateNodes" command to actually make the modifications. So our commands will be (query 'collection("Customers.dbxml")/Customers/Customer/CustomerFName[/Customers/Customer/CustomerId=1]') to load and (updateNodes 'collection("Customers.dbxml")/Customers/Customer/CustomerFName[/Customers/Customer/CustomerId=1]' 'Eric') for the update. Notice the XPath expressions are exactly the same in both commands. See below:
We have updated the Customer with CustomerId=1, and changed the CustomerFName to Eric.
Now we are going to delete a Customer. Deleting is a bit different than updating; we actually delete the document associated with a customer, in order to do this we have to call the removeDocument command as we are actually removing the document associated with our customer. This command requires the document name as a parameter. This is why it is prudent to name the document in a format that can be recalled easily such as (container name + key) customer001. You can always retrieve the document by issuing a query XPath expression and then issuing a printNames command which will list the name of the document that the query returned a result for. To delete the customer with a CUstomerId=2 we do the following. Our first command is (query 'collection("Customers.dbxml")/Customers/Customer[/Customers/Customer/CustomerId=2]') then we issue (printNames) this returns the document name which is "customer002" We then issue the command (removeDocument customer002). See below:

Let's query all the customers to see the results of our modifications. We will issue the command (query 'collection("Customers.dbxml")/*') to retrieve all nodes. You will notice only 3 objects are returned instead of 4 because we deleted customer 2. We then issue a (print) command to display the results. If we look at the documents we notice that we have updated the customer with a CustomerId=1 to have the first name Eric, we also deleted our first customer so the results do not display that customer. See below:
This concludes our introduction to the BDB XML, and the command line utility. Some things to keep in mind are that an XML DB does not work in the same way as a traditional relational DB. BDB XML provides support for XQuery, FLOWR and numerous other features for writing DB scripts, and complex document transformation. The C++ API to the BDB XML and the C# API built on top of it offer a much more feature rich environment to work with than the command line shell, so don't let the obscurity of the shell scare you away just yet. Most if not all of the time you would not be using the shell for applications, but it is good to know what is going on at this level.

Points of Interest

Resources

In the dbxml shell you can always type "help" or "help + command" (ex. help updateNodes) would return instructions. Below are some very important resources I have found. If you are serious about using BDB XML you will want to review the links.

The Oracle BDB XML windows installer
http://download.oracle.com/berkeley-db/dbxml-2.1.8.msi
The Oracle BDB XML Documentation
http://www.oracle.com/technology/documentation/berkeley-db/xml/index.html

Note: Introducing Berkeley DB is a good place to start for command line syntax, as well as basic information about indexes, and containers.

The C++ and Java documentation is pretty easy to figure out even if you are not familiar with the languages. You can pick up more detail about BDB XML using these.
The W3C XML Schema tutorial
http://www.w3schools.com/schema/default.asp
The W3C XPath tutorial: Start your XPath exploration here.
http://www.w3schools.com/xpath/
The W3C XQuery tutorial: This really empowers your BDB XML
http://www.w3schools.com/xquery/
FLOWR overview
http://www.stylusstudio.com/xquery_flwor.html