Introduction
The following article came about after reading Evolutionary
Database Design by Martin Fowler and Pramod Sadalage. The
concepts were great, but I couldn't find anything on the web that really showed
how to go about implementing it. I really liked the idea of testing the
migration of the database, so I decided to have a go at it myself from the
ground up. Following is how I've managed to do it for the current project
I'm working on.
The Environment
Developers box
- SQL Server Instances
- osql.exe (or isql.exe)
- Nant 0.85
- Some form of source control (I'm using Perforce)
Build server
- SQL Server Instances
- DEV
- DEPLOY (I can't afford a separate server for staging at the
moment)
- osql.exe (or isql.exe)
- Nant 0.85
- CruiseControl.NET
- Some form of source control (I'm using Perforce)
Production server (ProdServer)
Folder/Project structure
- CreateScripts
- Databases
- Tables
- Views
- Constraints
- Triggers
- StoredProcedures
- Functions
- MigrationScripts
Requirements
Our current project requires regular releases, and these often involve
database changes. We are using Perforce as our source control
system. CruiseControl.NET runs on our build server and is configured to
fire of a NAnt build script on every code checkin. The build requirements
at checkin are...
- Create an empty database for unit testing. Run all unit tests on it.
- Test the deployment of a new production or upgrade of an existing production
database.
Implementation
The setup
As indicated in the folders section of the environment, it's important that
the folders for the project are set up in the above way. There are rules
however that need to be enforced within this structure.
The CreateScripts\Databases and CreateScripts\Tables folders should be
read only for developers. This is because creating a database, or creating
a table requires that these database objects don't exist in the first
place. If we have an existing database, and we CREATE
DATABASE on it, or an existing table and we CREATE TABLE on
it, the script will fail (or worse if we include IF EXISTS... DROP
TABLE). So instead, these scripts should be located in the
migration scripts. Any changes to existing tables (ALTER
TABLE) or database (ALTER DATABASE) also belong in the
MigrationScripts folders.
The remaining folders can be used quite happily to store the objects.
Destroying the objects and recreating them will not affect the underlying
data.
One last area to note is that the CreateScripts\Constraints folder
should contain only referential integrity constraints. These constraints
should NOT be included in the CREATE TABLE script that is placed in
the MigrationScripts folder.
The driver script
We start off first by building the overall application.build script use by
CruiseControl.NET to kick off the build.
="1.0" ="UTF-8"
<project name="Application" basedir="." default="checkinBuild">
-->
<target name="checkinBuild">
-->
<property name="database.server" value="(local)"/>
<property name="database.instance" value="DEV"/>
<nant buildfile="Database.build" inheritall="true"
target="build" failonerror="true"/>
-->
<property name="database.server" value="(local)"/>
<property name="database.instance" value="DEPLOY"/>
<nant buildfile="Database.build" inheritall="true"
target="restoreDatabase"/>
<nant buildfile="Database.build" inheritall="true"
target="deploy" failonerror="true"/>
</target>
</project>
The default target is checkinBuild, indicating this NAnt target will be
run on every code checkin. First, we call the database.build file
build target. The build target is determined by setting the
database.server and database.instance properties prior
to calling the target with inheritall=true. The second target
- deploy - works the same way, except it executes either a new build, or
upgrades an existing build. Note also that we restore the database
first. This is because we don't want to run the scripts on the current
database as it stands (it may have already had migration scripts already
run). Instead, we want to restore the latest available production database
backup. This way, we are testing on a real system.
The new database script
The new database script is created in the database.build NAnt script.
The sole purpose of this target is to completely destroy the existing database,
and overwrite it with a brand new one, created from scratch. The result
meets requirement 1. A blank database ready for unit testing.
<target name="build" description="Create a brand new database">
<call target="createDatabase"/>
<call target="createTables"/>
<call target="createViews"/>
<call target="migrate"/>
<call target="createConstraints"/>
<call target="createProcedures"/>
<call target="createFunctions"/>
</target>
Key points to note about the above script are that we call the migrate target
(described later) after the createDatabase and createTables targets. The
reason we do this, is because after creating the database and tables, the
migration scripts may include alterations to tables, new tables, or removal of
colums from existing tables. These all need to be included to complete the
database structure.
Delving deeper into each of the createXxxx tasks, they look like this.
<target name="createTables">
<foreach item="File" in "CreateScripts\Tables"
property="script" failonerror="false">
<call target="osql" failonerror="true"/>
</foreach>
</target>
<target name="createProcedures">
<foreach item="File" in "CreateScripts\Tables"
property="script" failonerror="false">
<call target="osql" failonerror="true"/>
</foreach>
</target>
Each of these loops through all the scripts in the appropriate folders, and
runs them using osql on the database. (There is an osql task explained at
the bottom for this)
* Astute readers may have noted that we are running scripts on the
Databases and Tables folder, yet I mentioned that these should be read only for
developers. In short, these folders will be automatically populated at the
end of an iteration. I will explain further below.
The result, after running the build target, and calling all the create
scripts is that we have a blank newly created database, from scripts entered by
developers.
The deploy database script
The second requirement was that we be able to test the deployment of a
database to an existing server. There are two requirements here.
First, if the database doesn't already exist, then we need to create a brand new
one. If it DOES exist however, we just need to upgrade. We use osql
to connect to the database by the database.name property. We
assume that if osql fails, the database does not exist, therefore we need to
create it from fresh. If it succeeds however, we can safely assume there
is already a database by that name there, and we need to upgrade it. The
selection script looks like this...
<target name="deploy">
-->
<exec program="osql" commandline="-b -E -s
${database.connection} -d ${database.name} -Q "GO"
resultproperty="osql.result" failonerror="false"/>
<if test="${osql.result == '0'}">
<call target="upgrade"/>
</if>
<if test="${osql.result == '1'}">
<call target="build"/>
</if>
</target>
The upgrade script, is similar to the new script, but excludes creating the
database and creating the table scripts. (Otherwise we may potentially
drop and recreate them, losing data)
<target name="upgrade"
description="Upgrades an existing eventlive database">
<call target="backupDatabase" failonerror="true"/>
<property name="nant.onfailure" value="restoreDatabase"/>
<call target="migrate" failonerror="true"/>
<call target="createConstraints" failonerror="true"/>
<call target="createProcedures" failonerror="true"/>
<call target="createFunctions" failonerror="true"/>
</target>
The curly bits
This is where it gets a bit curly. When migrating data or upgrading
database as a series of tasks, the order that the tasks are executed is vitally
important. For example, the first task may add - say - a category table
and a column to the Suppliers table. It also adds the referential
integrity constraint between suppliers and categories. Order is important
here. First - we can't create the constraint until both table and column
are added. This is why the migration is run before
the constraints. If a later data migration task then wishes to add new
category to the table, we need to ensure that the create table script has been
run first. In this case, the order within the
MigrationScripts folder is important.
There are a number of ways to do this.
- Manually
- DateCreated
- DataModified
I use date created by performing a dir /b /od /tc *.sql >
order.txt, and then iterating through the resulting text file and running
the scripts in that order. Alternatively, you could remove the dir that I
execute and generate order.txt anyway you like. The migrate target, whilst
similar to the createXxx targets, doesn't iterate the directory though - rather,
the order.txt file.
<target name="migrate"
description="Run the migration scripts on the database">
<exec program="${solution.dir}\tools\listFiles.bat"
commandline="*.sql order.txt"
workingdir="Migration Scripts" failonerror="false"/>
<foreach item="Line" in="Migration Scripts\order.txt"
property="script">
<property name="script" value="Migration Scripts\${script}"/>
<call target="osql" failonerror="true"/>
</foreach>
</target>
Summary so far...
The above scripts are called for every checkin within the iteration.
Every code change that affects the data model, every data migration, data
insertion or cleanse always gets put into the MigrationScripts directory.
Each time it is checked in, the NAnt build rebuilds the database from
scratch. It also takes the latest backup of the production database, and
restores it to the DEPLOY instance - before finally running the deploy target on
this database to test the deployment.
End of Iteration
So far, everything has been going along nicely, checkins have been rebuilding
that database, and testing the migration scripts executing on the staging
database. Everything works as it should, and everybody is happy. All
we need to do now is finish off the iteration. First, we deploy to the
production server. We know this works because we tested it by restoring
from production and test running the scripts on the last build script.
Deployment is simple...
Nant.exe -buildfile:ApplicationDir\Database.build deploy
-D:database.server=ProdServer -D:database.instance=LIVE
The last thing we need to do is cleanup after the iteration. This
involves archiving the migration scripts - we don't want to be rerunning
them again. The problem is though - if we archive the migration scripts
and they contain create table scripts, we need to be able to run them again to
create the new database for unit testing in the next iteration. So what we
do instead, is use upgraded database (on DEPLOY) to automatically generate the
create database and create table scripts for us. These scripts are then
placed in the CreateScripts\Databases and CreateScripts\Tables folders -
by the build server not by a developer. Once these
scripts are generated, we can move the MigrationScript items into an archive
folder. The nant targets look like this.
<!---->
<!---->
<!---->
<target name="scriptDatabase">
<exec program="cscript.exe" verbose="true">
<arg line="${solution.dir}/Tools/scriptDatabase.wsf //Nologo //B"/>
<arg value="//job:database"/>
<arg value="${database.connection}"/>
<arg value="${database.name}"/>
<arg value="Create Scripts/Databases"/>
</exec>
<exec program="cscript.exe" verbose="true">
<arg line="${solution.dir}/Tools/scriptDatabase.wsf //Nologo //B"/>
<arg value="//job:tables"/>
<arg value="${database.connection}"/>
<arg value="${database.name}"/>
<arg value="Create Scripts/Tables"/>
</exec>
</target>
<target name="archiveScripts" description="Archive the database scripts"
depends="scriptDatabase">
<!---->
<tstamp property="archive.dir" pattern="yyyyMMdd"/>
<move todir="Archive\${archive.dir}">
<fileset basedir="Migration Scripts">
<include name="*"/>
</fileset>
</move>
</target>
Note - we are calling out to a script host program that generates the table
and database scripts via SQLDMO - briefly...
var objArgs, serverName, databaseName
objArgs = WScript.Arguments
if(objArgs.Length != 3)
{
WScript.Arguments.ShowUsage();
WScript.Quit(1);
}
serverName = objArgs(0);
databaseName = objArgs(1);
outputPath = objArgs(2);
var fso = new ActiveXObject("Scripting.FileSystemObject");
if(!fso.FolderExists(outputPath))
{
WScript.Echo("Folder doesn't exist");
WScript.Quit(1);
}
var sqlServer, sqlDB, tableCollection, table, e;
server = new ActiveXObject("SQLDMO.SQLServer");
server.LoginSecure = true;
server.Connect(serverName, null, null);
database = new ActiveXObject("SQLDMO.Database");
database = server.Databases(databaseName);
var SQLDMOScript_Drops = 1;
var SQLDMOScript_IncludeIfNotExists = 4096;
var SQLDMOScript_IncludeHeaders = 131072;
var SQLDMOScript_Default = 4;
var SQLDMOScript2_NoCollation = 8388608;
var SQLDMOScript_DRI_PrimaryKey = 268435456;
var SQLDMOScript_DRI_Checks = 16777216;
var SQLDMOScript_DRI_Defaults = 33554432;
var SQLDMOScript_DRI_UniqueKeys = 67108864;
var SQLDMOScript_ToFileOnly = 64; lang=jscript>
tableCollection = new Enumerator(database.Tables);
for( ; !tableCollection.atEnd() ; tableCollection.moveNext())
{
table = tableCollection.item();
if(!table.SystemObject)
{
var params = SQLDMOScript_Drops |
SQLDMOScript_IncludeIfNotExists |
SQLDMOScript_IncludeHeaders |
SQLDMOScript_Default |
SQLDMOScript_DRI_PrimaryKey |
SQLDMOScript_DRI_Checks |
SQLDMOScript_DRI_Defaults |
SQLDMOScript_DRI_UniqueKeys |
SQLDMOScript_ToFileOnly;
table.Script(params, fso.BuildPath(outputPath,
table.Name + ".sql"),
null, SQLDMOScript2_NoCollation);
WScript.Echo("Processing " + table.Name);
}
}
server.DisConnect();
We have a complete set of CreateScripts in their respective folders, the
archive target is called which creates a new directory with the date set.
The MigrationScript files are then moved to this directory. (And later
checked in to source control) The iteration is closed.
Conclusion
It's not that difficult to get it running. The advantages of spending
the effort are tremendous though. In previous companies I've worked at,
the running of scripts has been done by the DBA, and based on a weeks work
getting everyone together to rollback failed ones, update the ones that depended
on the failed ones, and generally just make sure it works. It's prone to
transcription errors, permission errors, script errors and
inconsistencies. By testing the deployment each time, as well as unit
testing, we can reduce this. One last thing worthy of note though is that
just because it's easier to change, doesn't mean that developers can hack
tables. Thought still needs to be put into the design of the database.
Notes:
- I've ignored the backupDatabase and restoreDatabase targets - these
are left as the dreaded exercise for the reader.
- I've used .wsf files for scripting the tables and database creation because
it's fast enough and it works for me. If speed is an issue, vb, C#, C++,
java or whatever you want can be used instead.
- I'm assuming that the production database backups are able to be transferred
to the build server. In some environments, production data is sensitive or
bound by privacy laws. In these cases, a test database could be used
instead. It's important though that it be of the same structure as the end
of the previous iteration.
- Any database refactoring scripts or data cleanse / migration scripts that
are stored in the MigrationScripts folder should have SQL verifying the values
at the start and finish of the script to ensure that it migrated
correctly. Failures should throw errors forcing a build failure.
RAISEERROR could be appropriate.