Have you seen this situation before?
- Your team is writing an enterprise application around a database
- Since everyone is building around the same database, the schema of the database is in flux
- Everyone has their own "local" copies of the database
- Every time someone changes the schema, all of these copies need the latest schema to work with the latest build of the code
- Every time you deploy to a staging or production database, the schema needs to work with the latest build of the code
- Factors such as schema dependencies, data changes, configuration changes, and remote developers muddy the water
How do you currently address this problem of keeping the database versions in working order? Do you suspect this is taking more time than necessary? There are many ways to approach this problem, and the answer depends on the workflow in your environment. The following article describes a distilled and simplistic methodology you can use as a starting point.
- Since it can be implemented with ANSI SQL, it is database agnostic
- Since it depends on scripting, it requires negligible storage management, and it can fit in your current code version management program
Any way you look at it, managing development is a hard problem. As your development team and application grow in size, as the requirements change, your overhead costs of management inevitably grows as well. There is a constant need of documentation, regression testing, and version management within this flux to maintain stability, all without losing "momentum".
As evolving applications change build versions, there are many "application hosts" that need to be synchronized. Often, this leads to a large amount of unnecessary bugs, and an excess percentage of time dedicated solely to the synchronization of versions of the "latest build" among various test/staging/and production platforms.
This article is one of many designed to help you effectively manage your changes with minimal time involved. Since code version management programs have been around before I was born, this article deals with database version management.
CVS, SourceSafe, ClearCase, SourceGear, SubVersion... The code versioning programs go on and on... What about databases? How do we seamlessly upgrade the schema/data of our production database with minimal or no downtime? How do multiple developers "get the latest" on the database schema and data?
You should be aware that some great database "diff" programs already exist. StarInix, RedGate, and SQL Delta come readily to mind. Some of the higher end ones come with a hefty price tag, because they generate synchronization scripts for you. Remember that time you trusted an all-purpose program to generate your HTML for you? Do you really want to trust an all-purpose program to automatically change your database schema for you?
Don't get me wrong, these programs are a great starting point for viewing and solving differences, but I highly recommend a more "hands-on" approach to a versioning management policy. Why?
When you write the scripts yourself, you not only have a higher degree of control, you have a higher degree of consciousness of what is changing in your database, when it changed, and who initiated the change.
A proposed solution
As each database presents its own unique challenge, I don't offer you a one-size-fits-all downloadable application. I offer a time-tested and provably effective methodology that's flexible enough to adapt to the workflow of your development cycle, provided you're ready to get your hands a little dirty with SQL scripting code!
What you will need:
- An existing source control system
- Anything, really. You can use the same repository as your code base, or simply a single text file on an accessible network share as a last resort.
- Database permissions to run powerful scripts
- (Nominated developers will need something such as "DBOwner" access where they need to make changes.)
- A competent knowledge of SQL
- You will need to know how to write scripts to update the schema and data of your specific brand of database. 90% of the time, these are relatively simple scripts.
Known the limitations to this methodology
- There are some data types that can't be manipulated through the use of text-based scripts. These include:
- Binary (image, bitmap, blob)
- Variable-length text (Text-16)
- This may vary depending on your database platform. Can you employ replication?
- Data changes that rely on temporal data (datetimes, automatically generated IDs generated by seeds) can't be guaranteed consistent throughout multiple databases.
- If your code is relying on consistent autonumbers between database versions, make sure this isn't a conceptual design flaw!
The versioning table (in essence)
For the sake of flexibility and verbosity, declare a table within the database (I'll call it DBVersion) serving as a version indicator. Each row should be able to store some form of a version identifier (e.g.: if you use FogBugz, this would be your CASEID), a brief description, when the change happened, who initiated the change, and any extra information you may find convenient.
This table will take the guesswork out of determining the last update to the database.
In conjunction with the table, we need a script that performs actions driven by the values in the table. The following is the script algorithm:
Start transaction [ALL]
For each version change I to N Do:
If(not exists (select (1) from DBVersion where DBVersionID=I))
Start transaction [i]
<[do your update stuff here]>
If failure, rollback and exit
Else Insert into DBVersion values(I,<>,getDate(),User_ID())
Commit transaction [i]
Commit transaction [ALL]
Now, observe this closely. This script is designed to update only as much as it needs to, without performing the same update twice. If the database already has record of a certain version change, the script will simply move on. The script will run on databases under different versions with a consistent outcome, and a clear audit trail of how and when the changes occurred. If, for some reason, the script fails, the nested transaction structure enforces the database to roll back to its state before execution.
If the script is built properly in this manner, synchronizing the database can be as easy as "getting the latest" version of the code and running the script before testing. Is someone complaining the latest build doesn't work? Tell them to run the script again and take a hike!
If there is one fundamental rule to the script, it is never delete from the script. Occasionally, it seems tempting to delete past mistakes in the script, but this ruins the serial record of changes inside the DBVersioning table! Instead of deleting, append an additional version change that amends previous mistakes. After all, most scripts will (hopefully) not require a critically long execution time.
Even though it is not necessary that the DBVersioning methodology is implemented from the inception of the database, there are clear advantages of having a build script from the ground up.
Real-life example using SQL Server
Every major project begins with a single step. Adopting this versioning and "getting into the habit" is the hardest part, so I will supply some code for our beloved SQL Server Northwind database as a starting example.
BEGIN TRANSACTION main
if(not exists (select (1) from dbo.sysobjects
where id = object_id(N'[dbo].[DBVersion]') and
OBJECTPROPERTY(id, N'IsUserTable') = 1))
print N'Building the foundation ''DBVersion'' Table'
BEGIN TRANSACTION one
CREATE TABLE DBVersion (
[DBVersionID] int NOT NULL ,
[Description] varchar (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ExecutionDate] datetime NOT NULL ,
[UserID] int NOT NULL
if(@@error <> 0)
INSERT INTO DBVersion values (1,'Build the DBVersion Table',
COMMIT TRANSACTION one
if(not(exists(select (1) from DBVersion where DBVersionID = 2)))
print N'Adding a ''DateCreated'' field onto the Customers table for auditing purposes'
BEGIN TRANSACTION two
ALTER TABLE Customers add DateCreated DateTime not null default getDate()
if(@@error <> 0)
INSERT INTO DBVersion values (2,
'Adding the DateCreated field to customers',getDate(),User_ID())
COMMIT TRANSACTION two
COMMIT TRANSACTION main
Note that as the above code becomes more verbose, these scripts can reach a daunting length in a short amount of time, so it may be to your benefit to store multiple "chapters" of the conceptual evolution of your database.
The above script is a bare-bones example for you to build upon for your specific needs. Change it as you like, the core concept lies within the algorithm.
- Maybe you could write a Stored Procedure that takes in a version identifier and some update batch code as arguments.
- Maybe some parts of your data will rely on Replication services for updates.
- Maybe there is some extra grunt work involved with updating legacy databases.
Since each project presents its own unique challenges, be sure to help the development community, and share your experiences and additions to this concept. Thanks for reading, and best of wishes!