In spite of some notable opposition, NoSQL has been all the rage lately. In particular, JSON-oriented document stores like Mongo and Couch have really become the darlings of the web application crowd.
Of course, JSON (or BSON) isn’t the only game in town. When it comes to document store formats, the other white meat if you will is XML.
As far as I could find, XML databases have actually been around longer than JSON based ones (the first XML database eXist was introduced in 2000, whereas the first JSON based one Couch DB came on the scene about 5 years later). Yet in spite of this head start, XML databases appear to be the red-headed step child. I was curious as to why this is the case, and here’s what I found.
XML vs. JSON
Let’s first consider the two formats in question. While it’s debatable which format is more commonly used for storage, it’s a lot less debatable which is considered to be the hipper of the two. Spoiler alert: it’s JSON.
XML is Worse
So, why is XML bad? Well, a big knock against XML is that it’s too “heavy weight” and “enterprisy”.
First, it’s obviously more verbose:
<complaint>it's too verbose</complaint>
<complaint>it's too complex</complaint>
'it\'s too verbose',
'it\'s too complex',
Not counting white space, XML takes 171 characters whereas JSON takes 86. This amounts to almost 50% fewer characters for JSON, which makes it a much better format for transporting data over a distributed network (at least in uncompressed form).
XML is also more complex because it allows both attributes and elements, whereas JSON limits it to just elements. Some say that JSON parsers are available more languages than XML. And of course JSON can be natively processed in the browser, which makes it for a much better “X” in Ajax (Doug Crockford’s quote, note mine).
XML is Better
On the other hand, XML has a bunch of useful supporting technologies around it:
- validations against a predefined schema using XML Schemas, Schematrons, and DTDs
- traversal using XPath
- transformations with XSLT
- searching with XQuery
- referencing other XML with XLink or XInclude
Now, there is no doubt that working with some of them can be painful (I’m looking at you XSLT). The tooling isn’t great, debugging is awkward, testability is questionable, etc.
Moreover, similar versions of some of these also exist for JSON. For instance there is JSON Path and JSON Schema. That said, I’m not sure how widely utilized they are.
Ok, let’s finally get back to the main point of this post: XML databases. Here’s a small sample of the capabilities you typically get with them:
- XML CRUDS (create, retrieve, update, delete, and search via XQuery)
- Document validation (using XML Schema)
- Document references (via XInclude or XLink)
- Library services (versioning, diffing, branching)
- Storing non-xml but meta-tagged content (like images)
Of these, only CRUDS operations are well represented in JSON-based stores. Mongo DB, for example, has pretty advanced querying capabilities supported by database indexes.
Other capabilities are much more common (if not unique) to XML databases. Consider for example document references. In XML, you can reference one document from another using XInclude:
<xi:include href="menuItems/BeefStroganof.xml" />
<xi:include href="menuItems/RasperryIceTea.xml" />
XML databases which have support for XInclude (like Mark Logic) will automatically resolve the reference and return to you a complete document using basically a single line of code.
That said, XML databases do have unique and useful capabilities which can save you a lot of effort, if you need them. Hence, don’t dismiss XML databases off-hand just because XML isn’t cool.
You may also like: