Database Schema Versioning Management 101

Peter Weissbrod

Rate me:

4.43/5 (6 votes)

10 Apr 2007CPOL6 min read

67.1K

An introductory approach to synchronizing database schema versions

Introduction

Have you seen this situation before?

Your team is writing an enterprise application around a database
Since everyone is building around the same database, the schema of the database is in flux
Everyone has their own "local" copies of the database
Every time someone changes the schema, all of these copies need the latest schema to work with the latest build of the code
Every time you deploy to a staging or production database, the schema needs to work with the latest build of the code
Factors such as schema dependencies, data changes, configuration changes, and remote developers muddy the water

How do you currently address this problem of keeping the database versions in working order? Do you suspect this is taking more time than necessary? There are many ways to approach this problem, and the answer depends on the workflow in your environment. The following article describes a distilled and simplistic methodology you can use as a starting point.

Since it can be implemented with ANSI SQL, it is database agnostic
Since it depends on scripting, it requires negligible storage management, and it can fit in your current code version management program

Background

Any way you look at it, managing development is a hard problem. As your development team and application grow in size, as the requirements change, your overhead costs of management inevitably grows as well. There is a constant need of documentation, regression testing, and version management within this flux to maintain stability, all without losing "momentum".

As evolving applications change build versions, there are many "application hosts" that need to be synchronized. Often, this leads to a large amount of unnecessary bugs, and an excess percentage of time dedicated solely to the synchronization of versions of the "latest build" among various test/staging/and production platforms.

This article is one of many designed to help you effectively manage your changes with minimal time involved. Since code version management programs have been around before I was born, this article deals with database version management.

Your options

CVS, SourceSafe, ClearCase, SourceGear, SubVersion... The code versioning programs go on and on... What about databases? How do we seamlessly upgrade the schema/data of our production database with minimal or no downtime? How do multiple developers "get the latest" on the database schema and data?

You should be aware that some great database "diff" programs already exist. StarInix, RedGate, and SQL Delta come readily to mind. Some of the higher end ones come with a hefty price tag, because they generate synchronization scripts for you. Remember that time you trusted an all-purpose program to generate your HTML for you? Do you really want to trust an all-purpose program to automatically change your database schema for you?

Don't get me wrong, these programs are a great starting point for viewing and solving differences, but I highly recommend a more "hands-on" approach to a versioning management policy. Why?

When you write the scripts yourself, you not only have a higher degree of control, you have a higher degree of consciousness of what is changing in your database, when it changed, and who initiated the change.

A proposed solution

As each database presents its own unique challenge, I don't offer you a one-size-fits-all downloadable application. I offer a time-tested and provably effective methodology that's flexible enough to adapt to the workflow of your development cycle, provided you're ready to get your hands a little dirty with SQL scripting code!

What you will need:

An existing source control system

Anything, really. You can use the same repository as your code base, or simply a single text file on an accessible network share as a last resort.

Database permissions to run powerful scripts

(Nominated developers will need something such as "DBOwner" access where they need to make changes.)

A competent knowledge of SQL

You will need to know how to write scripts to update the schema and data of your specific brand of database. 90% of the time, these are relatively simple scripts.

Known the limitations to this methodology

There are some data types that can't be manipulated through the use of text-based scripts. These include:

Binary (image, bitmap, blob)
Variable-length text (Text-16)
This may vary depending on your database platform. Can you employ replication?

Data changes that rely on temporal data (datetimes, automatically generated IDs generated by seeds) can't be guaranteed consistent throughout multiple databases.

If your code is relying on consistent autonumbers between database versions, make sure this isn't a conceptual design flaw!

The strategy

The versioning table (in essence)

For the sake of flexibility and verbosity, declare a table within the database (I'll call it DBVersion) serving as a version indicator. Each row should be able to store some form of a version identifier (e.g.: if you use FogBugz, this would be your CASEID), a brief description, when the change happened, who initiated the change, and any extra information you may find convenient.

This table will take the guesswork out of determining the last update to the database.

schema

The script

In conjunction with the table, we need a script that performs actions driven by the values in the table. The following is the script algorithm:

VBScript

Start transaction [ALL]
    For each version change I to N Do:
    If(not exists (select (1) from DBVersion where DBVersionID=I))
    Begin
        Start transaction [i]
            <[do your update stuff here]>
            If failure, rollback and exit
            Else Insert into DBVersion values(I,<>,getDate(),User_ID())
        Commit transaction [i]
    End
    Next [i]
Commit transaction [ALL]

Now, observe this closely. This script is designed to update only as much as it needs to, without performing the same update twice. If the database already has record of a certain version change, the script will simply move on. The script will run on databases under different versions with a consistent outcome, and a clear audit trail of how and when the changes occurred. If, for some reason, the script fails, the nested transaction structure enforces the database to roll back to its state before execution.

If the script is built properly in this manner, synchronizing the database can be as easy as "getting the latest" version of the code and running the script before testing. Is someone complaining the latest build doesn't work? Tell them to run the script again and take a hike!

If there is one fundamental rule to the script, it is never delete from the script. Occasionally, it seems tempting to delete past mistakes in the script, but this ruins the serial record of changes inside the DBVersioning table! Instead of deleting, append an additional version change that amends previous mistakes. After all, most scripts will (hopefully) not require a critically long execution time.

Even though it is not necessary that the DBVersioning methodology is implemented from the inception of the database, there are clear advantages of having a build script from the ground up.

Real-life example using SQL Server

Every major project begins with a single step. Adopting this versioning and "getting into the habit" is the hardest part, so I will supply some code for our beloved SQL Server Northwind database as a starting example.

SQL

--Enforce this is Northwind
USE Northwind
--To ensure ACID, begin the main transaction
BEGIN TRANSACTION main

   --make sure the DBVersion table exists
   if(not exists (select (1) from dbo.sysobjects 
      where id = object_id(N'[dbo].[DBVersion]') and 
      OBJECTPROPERTY(id, N'IsUserTable') = 1))
   begin
     print N'Building the foundation ''DBVersion'' Table'
     BEGIN TRANSACTION one
     --build the table
     CREATE TABLE DBVersion (
      [DBVersionID] int NOT NULL ,
      [Description] varchar (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [ExecutionDate] datetime NOT NULL ,
      [UserID] int NOT NULL
     )

     --any potential errors get reported,
     --and the script is rolled back and terminated
     if(@@error <> 0)
     begin
       ROLLBACK TRANSACTION
       RETURN
     end

     --insert the first version marker
     INSERT INTO DBVersion values (1,'Build the DBVersion Table',
                                   getDate(),User_ID())
     
   COMMIT TRANSACTION one
   end
   --finished step one
   ---------------------------------------------------------------
   ---------------------------------------------------------------
   --Continuing, adding to this script is relatively simple! Observe...
   if(not(exists(select (1) from DBVersion where DBVersionID = 2)))
   begin
     print N'Adding a ''DateCreated'' field onto the Customers table for auditing purposes'
     BEGIN TRANSACTION two

     --add the column, or whatever else you may need to do
     ALTER TABLE Customers add DateCreated DateTime not null default getDate()

     --any potential errors get reported, and the script is rolled back and terminated
     if(@@error <> 0)
     begin
       ROLLBACK TRANSACTION
       RETURN
     end

     --insert the second version marker
     INSERT INTO DBVersion values (2, 
       'Adding the DateCreated field to customers',getDate(),User_ID())

   COMMIT TRANSACTION two
end

--from here on, continually add steps in the previous manner as needed.
COMMIT TRANSACTION main

Note that as the above code becomes more verbose, these scripts can reach a daunting length in a short amount of time, so it may be to your benefit to store multiple "chapters" of the conceptual evolution of your database.

Modifications

The above script is a bare-bones example for you to build upon for your specific needs. Change it as you like, the core concept lies within the algorithm.

Maybe you could write a Stored Procedure that takes in a version identifier and some update batch code as arguments.
Maybe some parts of your data will rely on Replication services for updates.
Maybe there is some extra grunt work involved with updating legacy databases.

Since each project presents its own unique challenges, be sure to help the development community, and share your experiences and additions to this concept. Thanks for reading, and best of wishes!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Peter Weissbrod

Web Developer

United States

Independent contract developer for .NET data-oriented systems.

not much to say about myself but feel free to contact me for any inquiries and comments!

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.