Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Generate SQL Database Schema from XML - Part 1 (File Format and XSD)

4.95/5 (15 votes)
28 Apr 2010CPOL14 min read 2   2.8K  
Part 1 of 3 in a series of articles about generating SQL from an XML file format. This part describes the background to the project and talks about the development of the file format and an XSD schema.

Introduction

This is the first in a series of three articles describing a project that I undertook to solve a specific problem in my working environment. The project has two core outcomes: a new XML file format for describing the schema of a database, and an executable for transforming the new format into SQL. Along the way, I'll share thoughts from the design process and introduce some supporting code libraries which may have applications outside this project. I have assumed a basic understanding of several technologies, but will also provide links and definitions, so hopefully nobody feels excluded.

In this first article, I will describe the background to the project and talk about the development of the file format and an XSD schema. In the second article, I will focus on creating a code model that maps onto the file format. The third and final article will cover a simple approach to code generation and tie up all the loose ends.

Background

I use Microsoft SQL Server (usually just known as SQL Server) regularly in a team environment. When I talk about SQL in these articles, I am really referring to Microsoft's version of the language called T-SQL, though other versions of the language could easily be addressed in the future. SQL has a long and distinguished history, and is used by millions of people in all sorts of projects including mine, but this project exists because it has shortcomings that I find frustrating.

SQL is commonly used in two distinct ways: as a DDL and for CRUD queries. When a new database is created, it is not only empty of data, but also empty of structure. Before data can be added (e.g., the details of a person), a table must be added to contain the data. This is done in SQL with statements such as these:

SQL
CREATE TABLE [dbo].[EXAMPLE_TABLE](
  [PK] [int] IDENTITY(1,1) NOT NULL,
  [EXAMPLE_COLUMN1] [varchar](50) NOT NULL,
)
Definition of a table called 'EXAMPLE_TABLE' with two columns: 'PK' and 'EXAMPLE_COLUMN1'

Similar statements are used for describing default values for columns, indexes, relationships, constraints, and much more besides. All of these definitions collectively are called the database schema. Once a schema is defined, the database can be populated with data, the data can be retrieved or modified, and so forth. For querying purposes, I don't have a problem with SQL, although it is increasingly common to avoid using raw SQL for queries, and instead to generate it 'behind the scenes', for instance using LINQ to SQL.

No, it is in the definition of the schema that I dislike SQL. To begin with, I generally feel that data is best described in a declarative way. The schema of a database is unarguably data; indeed it is stored in 'system tables' in the database. But SQL is relatively hard to parse. To treat it like data, you generally have to load it into an actual SQL Server instance, then use a programming interface (e.g., DbConnection.GetSchema) to query it. This seems wrong to me, and I decided to try and fix it.

Those who regularly use databases will be protesting at this point that there is more to DDL than just creating the schema. Once a database is full of structure and data, it is generally not possible to throw it all away and start from scratch! Instead it becomes necessary to manage change. SQL is a full procedural language, so it can be used to make changes based on certain conditions as well as to massage data, temporarily relax constraints, and so forth. However, the context of this project is that where I work, we have a toolchain for updating live databases from the snapshot of an empty DB, created using simple DDL. I do believe that changes in the database schema can be better represented in a declarative way, and it is a longer term goal for this project. For the time being though, the scope of this project only goes as far as creating a new empty database from scratch.

"So, why would you want to treat a database schema as data?" I hear you ask. One answer is for code generation. There are several existing technologies for mapping the schema of a database into auto-generated classes, a technique known as Object Relational Mapping (ORM). In a nutshell, tables and columns are mapped onto classes and properties, and the data is serialized automatically to/from the database into class instances. Two of the leading players in this field (Microsoft Entity Framework and LLBLGen Pro) take the approach of updating their generated code from a database instance.

ORMFlow.PNG

A popular process of generating ORM code

It is a perfectly reasonable approach, but it does assume that the database already has the correct schema, and this is not always the case in a team environment. When working in a team, it is desirable that everybody can make changes to the database schema simultaneously, then merge their changes later. It is also desirable to have a history of changes and who made them. In other words, I'm talking about source control. Now it is perfectly possible to have a database schema defined in SQL under source control. However, whenever one developer gets another's changes, they must remember to update their own local database instance before they update the generated code using the ORM tool.

ORMFlow2.PNG

Actually, it's a four step process when working with source control

This manual update of the database instance is only required because the ORM tool depends on it. Otherwise, it might be possible to update the database schema automatically from within the application code. As it happens, there is a project called NHibernate that is not tied to a database instance in this way. In NHibernate, the code classes are the starting point, and the database schema is generated via an XML file that describes how the classes map onto it. Furthermore, it is possible to generate code from the XML mapping file. This is better: all developers can edit one XML file, and they get a database schema and ORM code generated auto-magically. The only problem with NHibernate for me is that the mapping file is code-centric. That is, it describes how to map NHibernate compatible code classes onto a database rather than describing the database itself. While the database schema is inherent in the NHibernate mapping file, it is not explicit. This is fine for many applications, but I'm not using NHibernate as my code ORM solution, so it doesn't work for me.

The second big problem I have in defining my database schema with SQL is that I don't like SQL as a language. I find it hard to remember, with a strange and awkward syntax. There, I've said it. Changing the database schema is something I do infrequently enough that when I come back to it, I often end up reaching for the reference. On the other hand, dementia notwithstanding, I am never going to forget how to edit XML. Sure, XML can be a little bit ugly, but at least, it is regular, and parsing it is a breeze. So, fully aware that it was my own time I was wasting, I decided to develop an XML-based file format that described a database schema, and develop some solution for translating the new format into SQL.

Thinking further ahead, I realised I'd eventually wanted to be able to edit the file format in a visual way (along the lines of SQL Server Management Studio). But a visual editor would be a substantial project in itself, so I decided to call that 'Phase 2' and simply bear it in mind during 'Phase 1'. In the meantime, my new file format would have to be easily hand-editable.

In the remainder of this article, I will describe the new file format and some of the design decisions that went into it. At times, this will get quite detailed, so it might be worth getting a brief introduction into XSD if you aren't at all familiar with it.

Requirements

Here is the short list of requirements I came up with for the file format and the process of translating it:

  • Describe the schema of a database
  • Version controllable with useful history
  • Easy to edit by hand
  • Memorable syntax

Schema: To keep the project manageable, I decided not to be a completist about the database schema. Instead, my goal was to fully describe a single, real-world database from a client. This meant:

  • Tables
  • Columns
  • Types
  • Defaults
  • Identity Specifications
  • Primary Keys
  • Unique Constraints
  • Indexes
  • Relationships
  • Stored Procedures

Version control: Obviously, any file can be version controlled. I wanted a file format that could be edited by multiple people and merged. I also wanted to be able to see changes clearly in the version history of the file.

Editing: As far as possible, I wanted a file format that was structured the same way I think about the database schema, designed with hand editing in mind.

Syntax: If I don't use SQL for a while, I tend to forget the details and end up reaching for the reference. So I wanted a file format that would guide me by being consistent and self documenting.

Aside from the obvious requirement of being able to translate the file format to SQL, my main concern was convenience. The process should be simple to run from a shortcut, a batch file, or a build step. It should also be convenient to deploy to a new machine.

File Format Implementation

As the title of this article suggests, I chose XML for my file format. It satisfies the requirements, and it has more general advantages that are beyond the scope of this article. In particular, with a valid XSD attached, Visual Studio will provide intellisense when editing the file. For the uninitiated, XML files have a schema just like databases, although as XML files are hierarchical in nature, the schema takes on a very different form. Although one is not required to make a valid XML file, a schema is useful because it constrains what is allowed to be present in the file and identifies the format to any person or code that looks at it. XSD, which is itself defined in XML, is the most 'standard' way of defining a schema for an XML file. Visual Studio (and other editors) will parse the XSD and use it to provide intellisense when editing the XML file.

I have included the entire XSD file here for reference, but it is rather long, so you may want to collapse it until you are ready to look at it:

XML
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="DatabaseSchemaModel"
    targetNamespace="http://olduwan.com/DatabaseSchemaModel.xsd"
    elementFormDefault="qualified"
    xmlns="http://olduwan.com/DatabaseSchemaModel.xsd"
    xmlns:dbsm="http://olduwan.com/DatabaseSchemaModel.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
  <!--Columns-->
  
  <xs:complexType name="parameterlessType"/>

  <xs:complexType name="identity">
    <xs:attribute name="seed" 
             use="optional" type="xs:integer"/>
    <xs:attribute name="increment" 
             use="optional" type="xs:integer"/>
    <xs:attribute name="notForReplication" 
             use="optional" type="xs:boolean"/>
  </xs:complexType>

  <xs:group name="withIdentity">
    <xs:sequence>
      <xs:element name="identity" minOccurs="0" 
                maxOccurs="1" type="identity"/>
    </xs:sequence>
  </xs:group>

  <xs:attributeGroup name="withDefaultExpression">
    <xs:attribute name="defaultExpression" 
               use="optional" type="xs:string"/>
  </xs:attributeGroup>
  
  <xs:complexType name="int">
    <xs:group ref="withIdentity"/>
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
          use="optional" type="xs:int"/>
  </xs:complexType>

  <xs:complexType name="bigint">
    <xs:group ref="withIdentity"/>
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="xs:long"/>
  </xs:complexType>

  <xs:complexType name="smallint">
    <xs:group ref="withIdentity"/>
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="xs:short"/>
  </xs:complexType>

  <xs:complexType name="tinyint">
    <xs:group ref="withIdentity"/>
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="xs:byte"/>
  </xs:complexType>

  <xs:complexType name="decimal">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="precision" 
           use="required" type="xs:integer"/>
    <xs:attribute name="scale" 
           use="required" type="xs:integer"/>
    <xs:attribute name="default" 
           use="optional" type="xs:decimal"/>
  </xs:complexType>

  <xs:complexType name="decimalScale0">
    <xs:group ref="withIdentity"/>
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="precision" 
           use="required" type="xs:integer"/>
    <xs:attribute name="default" 
           use="optional" type="xs:decimal"/>
  </xs:complexType>

  <xs:complexType name="bit">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="xs:boolean"/>
  </xs:complexType>

  <xs:simpleType name="moneydefault">
    <xs:restriction base="xs:decimal">
      <xs:totalDigits value="10"/>
      <xs:fractionDigits value="4"/>
      <xs:minInclusive value="-922337203685477.5808"/>
      <xs:maxInclusive value="922337203685477.5807"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="money">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="moneydefault"/>
  </xs:complexType>
  
  <xs:simpleType name="smallmoneyDefault">
    <xs:restriction base="xs:decimal">
      <xs:totalDigits value="10"/>
      <xs:fractionDigits value="4"/>
      <xs:minInclusive value="-214748.3648"/>
      <xs:maxInclusive value="214748.3647"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:complexType name="smallmoney">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="smallmoneyDefault"/>
  </xs:complexType>

  <xs:simpleType name="mantissaBits">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="1"/>
      <xs:maxInclusive value="53"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="float">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="mantissaBits" 
            use="optional" type="mantissaBits"/>
    <xs:attribute name="default" 
            use="optional" type="xs:decimal"/>
  </xs:complexType>

  <xs:complexType name="real">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
           use="optional" type="xs:decimal"/>
  </xs:complexType>

  <xs:attributeGroup name="withStringDefault">
    <xs:attribute name="default" 
           use="optional" type="xs:string"/>
  </xs:attributeGroup>

  <xs:simpleType name="fractionalSecondsPrecision">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="0"/>
      <xs:maxInclusive value="7"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="variablePrecisionTime">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attributeGroup ref="withStringDefault"/>
    <xs:attribute name="fractionalSecondsPrecision" 
         use="optional" type="fractionalSecondsPrecision" />
  </xs:complexType>

  <xs:simpleType name="charInteger">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="1"/>
      <xs:maxInclusive value="8000"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="charLength">
    <xs:union memberTypes="charInteger withMax"/>
  </xs:simpleType>

  <xs:complexType name="char">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attributeGroup ref="withStringDefault"/>
    <xs:attribute name="length" 
          use="optional" type="charLength"/>
  </xs:complexType>

  <xs:complexType name="binary">
    <xs:attribute name="length" 
          use="optional" type="charLength"/>
  </xs:complexType>

  <xs:simpleType name="withMax">
    <xs:restriction base="xs:string">
      <xs:enumeration value="max"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="ncharInteger">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="1"/>
      <xs:maxInclusive value="4000"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="ncharLength">
    <xs:union memberTypes="ncharInteger withMax"/>
  </xs:simpleType>

  <xs:complexType name="nchar">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attributeGroup ref="withStringDefault"/>
    <xs:attribute name="length" 
          use="optional" type="ncharLength"/>
  </xs:complexType>

  <xs:simpleType name="GUID">
    <xs:annotation>
      <xs:documentation xml:lang="en">
        The representation of a GUID.
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:string">
      <xs:pattern value="\{[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]
                            {4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}\}"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:complexType name="uniqueidentifier">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attribute name="default" 
            use="optional" type="GUID"/>
  </xs:complexType>

  <xs:complexType name="parameterlessStringType">
    <xs:attributeGroup ref="withDefaultExpression"/>
    <xs:attributeGroup ref="withStringDefault"/>
  </xs:complexType>
  
  <xs:complexType name="column">
    <xs:choice minOccurs="1" maxOccurs="1">
      <!--Exact numerics-->
      <xs:element name="bigint" type="bigint"/>
      <xs:element name="numeric" type="decimal"/>
      <xs:element name="numericScale0" type="decimalScale0"/>
      <xs:element name="bit" type="bit"/>
      <xs:element name="smallint" type="smallint"/>
      <xs:element name="decimal" type="decimal"/>
      <xs:element name="decimalScale0" type="decimalScale0"/>
      <xs:element name="smallmoney" type="smallmoney"/>
      <xs:element name="int" type="int"/>
      <xs:element name="tinyint" type="tinyint"/>
      <xs:element name="money" type="money"/>

      <!--Approximate numerics-->
      <xs:element name="float" type="float"/>
      <xs:element name="real" type="real"/>

      <!--Date and time-->
      <xs:element name="date" type="parameterlessStringType"/>
      <xs:element name="datetimeoffset" type="variablePrecisionTime"/>
      <xs:element name="datetime2" type="variablePrecisionTime"/>
      <xs:element name="smalldatetime" type="parameterlessStringType"/>
      <xs:element name="datetime" type="parameterlessStringType"/>
      <xs:element name="time" type="variablePrecisionTime"/>

      <!--Character strings-->
      <xs:element name="char" type="char"/>
      <xs:element name="varchar" type="char"/>
      <xs:element name="text" type="parameterlessStringType"/>

      <!--Unicode character strings-->
      <xs:element name="nchar" type="nchar"/>
      <xs:element name="nvarchar" type="nchar"/>
      <xs:element name="ntext" type="parameterlessStringType"/>

      <!--Binary strings-->
      <xs:element name="binary" type="binary"/>
      <xs:element name="varbinary" type="binary"/>
      <xs:element name="image" type="parameterlessType"/>

      <!--Other data types-->
      <xs:element name="rowversion" type="parameterlessType"/>
      <xs:element name="hierarchyid" type="parameterlessStringType"/>
      <xs:element name="uniqueidentifier" type="uniqueidentifier"/>
      <xs:element name="sql_variant" type="parameterlessStringType"/>
      <xs:element name="xml" type="parameterlessStringType"/>

    </xs:choice>
    <xs:attribute name="name" 
          use="required" type="xs:string"/>
    <xs:attribute name="allowNulls" 
           use="optional" type="xs:boolean"/>
  </xs:complexType>

  <xs:complexType name="columnList">
    <xs:sequence>
      <xs:element name="column" maxOccurs="unbounded" 
                     minOccurs="1" type="column"/>
    </xs:sequence>
  </xs:complexType>

  <!--Constraints-->
  
  <xs:simpleType name="sortOrder">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Ascending"/>
      <xs:enumeration value="Descending"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="constraintColumn">
    <xs:attribute name="name" use="required" type="xs:string"/>
    <xs:attribute name="sortOrder" 
            use="optional" type="sortOrder"/>
  </xs:complexType>

  <xs:complexType name="constraint">
    <xs:sequence>
      <xs:element name="column" maxOccurs="unbounded" 
            minOccurs="1" type="constraintColumn"/>
    </xs:sequence>
    <xs:attribute name="name" use="required" type="xs:string"/>
    <xs:attribute name="clustered" use="required" type="xs:boolean"/>
    <xs:attribute name="fillFactor" use="optional" type="xs:integer"/>
    <xs:attribute name="padIndex" use="optional" type="xs:boolean"/>
  </xs:complexType>

  <xs:complexType name="primaryKey">
    <xs:sequence>
      <xs:element name="key" maxOccurs="1" 
             minOccurs="0" type="constraint"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="uniqueConstraints">
    <xs:sequence>
      <xs:element name="constraint" 
            maxOccurs="unbounded" minOccurs="0" 
            type="constraint"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="index">
    <xs:complexContent>
      <xs:extension base="constraint">
        <xs:attribute name="unique" 
              use="optional" type="xs:boolean"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="indexes">
    <xs:sequence>
      <xs:element name="index" maxOccurs="unbounded" 
              minOccurs="0" type="index"/>
    </xs:sequence>
  </xs:complexType>

  <!--Relationships-->

  <xs:complexType name="relationshipColumn">
    <xs:attribute name="name" 
              use="required" type="xs:string"/>
  </xs:complexType>

  <xs:complexType name="foreignKeyColumns">
    <xs:sequence>
      <xs:element name="column" maxOccurs="unbounded" 
             minOccurs="1" type="relationshipColumn"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="primaryKeyTable">
    <xs:sequence>
      <xs:element name="column" maxOccurs="unbounded" 
            minOccurs="1" type="relationshipColumn"/>
    </xs:sequence>
    <xs:attribute name="name" 
            use="required" type="xs:string"/>
  </xs:complexType>

  <xs:complexType name="relationship">
    <xs:sequence>
      <xs:element name="foreignKeyColumns" 
          maxOccurs="1" minOccurs="1" 
          type="foreignKeyColumns"/>
      <xs:element name="primaryKeyTable" maxOccurs="1" 
             minOccurs="1" type="primaryKeyTable"/>
    </xs:sequence>
    <xs:attribute name="name" 
                use="required" type="xs:string"/>
  </xs:complexType>

  <xs:complexType name="relationships">
    <xs:sequence>
      <xs:element name="relationship" maxOccurs="unbounded" 
                 minOccurs="0" type="relationship" />
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="table">
    <xs:sequence>
      <xs:element name="columns" maxOccurs="1" 
           minOccurs="1" type="columnList"/>
      <xs:element name="primaryKey" maxOccurs="1" 
           minOccurs="1" type="primaryKey"/>
      <xs:element name="uniqueConstraints" maxOccurs="1" 
           minOccurs="1" type="uniqueConstraints"/>
      <xs:element name="indexes" maxOccurs="1" 
           minOccurs="1" type="indexes"/>
      <xs:element name="relationships" maxOccurs="1" 
           minOccurs="1" type="relationships"/>
    </xs:sequence>
    <xs:attribute name="name" 
          type="xs:string" use="required"/>
  </xs:complexType>

  <xs:complexType name="tables">
    <xs:sequence>
      <xs:element name="table" maxOccurs="unbounded" 
            minOccurs="0" type="table"/>
    </xs:sequence>
  </xs:complexType>

  <!--Stored procedures-->
  <xs:simpleType name="platform">
    <xs:restriction base="xs:string">
      <xs:enumeration value="SQLServer"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="procedure">
    <xs:attribute name="name" 
           type="xs:string" use="required"/>
    <xs:attribute name="path" 
           type="xs:string" use="required"/>
    <xs:attribute name="platform" 
           type="platform" use="optional"/>
  </xs:complexType>

  <xs:complexType name="procedures">
    <xs:sequence>
      <xs:element name="procedure" maxOccurs="unbounded" 
             minOccurs="0" type="procedure"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="database">
    <xs:sequence>
      <xs:element name="tables" maxOccurs="1" 
            minOccurs="1" type="tables"/>
      <xs:element name="procedures" maxOccurs="1" 
            minOccurs="1" type="procedures"/>
    </xs:sequence>
    <xs:attribute name="name" 
           type="xs:string" use="required"/>
    <xs:attribute name="ExampleFolderPath" 
           type ="xs:string" use="optional"/>
    <xs:attribute name="SQLServerOutputPath" 
           type ="xs:string" use="optional"/>
  </xs:complexType>

  <xs:element name="database" type="database"/>
</xs:schema>

I won't detail every design decision in the XSD, but in general, I used SQL Server Management Studio as a guide for my hierarchy, basing it around the table with indexes and relationships hanging off. Here is a very simple database schema defined with the new format:

XML
<?xml version="1.0" encoding="utf-8" ?>

<database xmlns="http://olduwan.com/DatabaseSchemaModel.xsd" name="Simple">
  <tables>
    <table name="Customer">
      <columns>
        <column name="CustomerId">
          <int>
            <identity/>
          </int>
        </column>
        <column name="Surname">
          <nchar length="50"/>
        </column>
      </columns>
      <primaryKey>
        <key name="PK_Customer" clustered="true">
          <column name="CustomerId"/>
        </key>
      </primaryKey>
      <uniqueConstraints>        
      </uniqueConstraints>
      <indexes>
        <index name="IX_Customer_Surname" clustered="false">
          <column name="Surname" sortOrder="Ascending"/>
        </index>
      </indexes>
      <relationships>
      </relationships>
    </table>
    <table name="Order">
      <columns>
        <column name="OrderId">
          <int>
            <identity/>
          </int>
        </column>
        <column name="Details">
          <nvarchar length="max"/>
        </column>
        <column name="Complete">
          <bit default="false"/>
        </column>
        <column name="CustomerId">
          <int/>
        </column>
      </columns>
      <primaryKey>
        <key clustered="true" name="PK_Order">
          <column name="OrderId"/>
        </key>
      </primaryKey>
      <uniqueConstraints>
      </uniqueConstraints>
      <indexes>
      </indexes>
      <relationships>
        <relationship name="FK_Order_Customer">
          <foreignKeyColumns>
            <column name="CustomerId"/>
          </foreignKeyColumns>
          <primaryKeyTable name="Customer">
            <column name="CustomerId"/>
          </primaryKeyTable>
        </relationship>
      </relationships>
    </table>
  </tables>
  <procedures>
  </procedures>
</database> 

And here is a simplified diagram of the database it describes:

SimpleDatabase.PNG

Hopefully, the format is fairly self-explanatory, if you are familiar with the basic concepts of SQL Server. I chose to use XSD because I was already familiar with it, and I know it is well supported in Visual Studio. As long as Visual Studio knows where to find the XSD (e.g., the XSD file is present in a project), then intellisense will guide you when editing the file. With a little thought, you can probably define your own database using this format without needing any kind of reference. XSD does have two limitations that might be addressed in other XML schema formats (e.g., RelaxNG), but to have working intellisense in Visual Studio was a high priority, so I chose to accept them.

Firstly, it is not possible to define the valid attributes/child elements of a type based on the value of an attribute. Notice that in the definition of a column, the type is a nested element:

XML
<column name="CustomerId">
  <int>
    <identity/>
  </int>
</column>

Arguably, it would have been more intuitive to have the type as an attribute of the column:

XML
<column name="CustomerId" type="int">
  <identity/>
</column>

But to do so, the column type would have to support every parameter that is required to describe any type. For example, int can have an identity specification, but decimal can't. Making the column type so generic would seriously impact its usefulness with intellisense. To make this approach work, I'd need to be able to express this statement in my schema:

"The column type can have an identity child if the attribute type is one of the set [int, bigint, smallint, tinyint, decimalScale0]".

There may be some XML schema format that allows this kind of flexibility, but I don't know about it, and it probably isn't well supported in Visual Studio.

I hit a second limitation of XSD when I wanted to validate the table and column names in a relationship. In the example above, you can see a relationship between the order table and the customer table: FK_Order_Customer. In a column element, the name attribute must be the name of a column in the 'foreign key' table (i.e., the table that contains this relationships element). There is no way to specify this in XSD. The same thing applies to the name attribute of the primaryKeyTable, which must be the name of one of the tables. While this can be validated at the time of translation, it would be much nicer if Visual Studio would provide an auto-completed list of table/column names when editing the file, and put a red squiggle under an invalid entry.

Column Types

As previously mentioned, the type of a column is represented with a nested element, which is represented in the XSD like this:

XML
<xs:complexType name="column">
  <xs:choice minOccurs="1" maxOccurs="1">
    <!--Exact numerics-->
    <xs:element name="bigint" type="bigint"/>
    <xs:element name="numeric" type="decimal"/>
    <xs:element name="numericScale0" type="decimalScale0"/>
    <xs:element name="bit" type="bit"/>
    <xs:element name="smallint" type="smallint"/>
    <xs:element name="decimal" type="decimal"/>
    <xs:element name="decimalScale0" type="decimalScale0"/>
    Many more types...

This dictates that an element with the type column must contain exactly 1 element which can be any of the types listed. So each SQL Server type is matched to a complex type in the XSD. There is a complex type for each set of parameters that apply to one or more SQL Server types. Note that this is not a 1:1 correspondence, as many SQL Server types share the same set of parameters. However, I have tried to make the parameters specific and validated wherever possible. For example, here is the int type from the XSD:

XML
<xs:complexType name="int">
  ...
  <xs:attribute name="default" use="optional" type="xs:int"/>
</xs:complexType>

Notice that the default attribute (which will translate to a literal default value for the column in the generated SQL), is strongly typed to only permit int values. When editing the file, this will help pick up copy-and-paste errors quickly. Other default attributes for other types are strongly typed whenever possible. Sometimes this required the definition of further simple types; for example, the money type in SQL Server has limitations that are not expressed by any of the built-in XSD types:

XML
<xs:simpleType name="moneydefault">
  <xs:restriction base="xs:decimal">
    <xs:totalDigits value="10"/>
    <xs:fractionDigits value="4"/>
    <xs:minInclusive value="-922337203685477.5808"/>
    <xs:maxInclusive value="922337203685477.5807"/>
  </xs:restriction>
</xs:simpleType>

<xs:complexType name="money">
  <xs:attributeGroup ref="withDefaultExpression"/>
  <xs:attribute name="default" use="optional" type="moneydefault"/>
</xs:complexType>

Here, moneyDefault is based on decimal. This means that all the restrictions of the decimal type apply, but more restrictions, such as the min and max valid values for the money type in SQL Server, apply as well.

Sometimes the default column value is an expression rather than a 'literal'. This is expressed with the defaultExpression attribute, which is just a string. When appropriate, the default will be quoted in the generated SQL; however, the defaultExpression is copied verbatim. It does not make sense to use both default and defaultExpression for a single column.

In the XSD for the money type above, you can see I have used an attributeGroup element called withDefaultExpression. This adds the defaultExpression attribute to the money type. An attribute group is useful because it allows you to modularise your design and re-use code. You can achieve a similar effect with inheritance, but XSD does not support multiple inheritance, so attribute groups are a good way of simulating it.

Conclusion

If you weren't familiar with XSD, then some of this might not have made too much sense. If so, I apologise, but the important thing is you don't actually need to know how the XSD works to benefit from it. If you create a Visual Studio project containing the XSD and a single XML file, then copy and paste the first example into the XML, then you should find that as you make changes to the example, Visual Studio will tell you if you've made a mistake. Alternatively, you can unzip the sample code and use the solution I made earlier. Obviously, this won't keep you happy for long because you can't really do anything with it yet. But in the next article, I will describe how I developed a code model in C# that maps to the new file format and enables you to inspect it programmatically in a simple and type-safe way.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)