A database is a key component in almost any modern, commercial application today, yet as a general rule, the industry has ignored the production of database development utilities. Those that exist are generally aimed at DBAs rather than developers, and almost all seem to require some level of manual intervention.
This project provides a mechanism for automatically generating a predictable, repeatable, and professional approach to releasing SQL code from a development environment into production. While the code does not explicitly require Visual Studio Database Edition to work, I would strongly recommend it, since it is the first IDE that provides real-time syntax checking of SQL code - validating the integrity of your code, object references (did you just change a column name?), as well as providing static code analysis to highlight some of those oft forgotten mistakes (and much, much more ... why Microsoft failed to include much of the functionality that I have added is beyond me).
Some of the issues that I have addressed are:
- Predictability: With most existing environments, the scripting of database objects is largely a manual process. Even with tools like the Schema Compare in Visual Studio Database Edition or those provided by RedGate software, most developers still find that they have to manually script out existence checks for objects (dropping procedures, checking to see if tables exist, interrogating the sys.indexes tables). For the most part, however, this is error prone - cut and paste the existence check for Object A, and then drop Object B. Now the existence check will automatically be built into the script, deployments won't fail because a table already exists.
- Security: One of the things that has often been a problem in large organizations is the scripting of security. I have abstracted all database level security to roles at a database level (as it is intended to be!), and left the actual user permissions and access up to the DBA, who should rightly have full control of the system and not have to worry about developers accidentally injecting development environment permissions into the code. Additionally, for those developers of large databases, you do not have to manually script out each permission for each object - I have provided a mechanism for generating this at build time, so if you forget to add it for that single Stored Procedure, the system will not break when you roll out.
- Repeatability: DBAs are human too, and errors occur during a roll out. Connections get dropped, power goes off, the DBA cuts and pastes the content across a remote session and it gets truncated - you name it, it can happen. It must be possible to simply pick up the process from where it last was, and a repeatable script that checks for existence on every object before it creates/alters it provides such a framework.
- Flexibility: Other utilities that I have used generate a single, large script. I have always found this inflexible, especially when dealing with a rollout that impacts multiple databases, or (heaven forbid) when there are circular dependencies between databases. The deployment scripts are separated into types (e.g., procedures, tables, etc.), and the execution of this can be sequenced any way you desire, since you can inject a custom script at any point in the process.
Setting Up the Sample Project
This is pretty straightforward. Download the source, unzip and then open the SampleDB.dbproj file in the Example VSDB Project folder. Ensure that the following lines in the SampleDB.targets file point to a valid SQL Server instance:
- Line 45: SqlInstance=".\SQL2005"
- Line 69: SqlInstance = ".\SQL2005"
- Line 76: ExtractFromServer = ".\SQL2005" (this only needs to be changed if you need to extract reference data from a static database; the default is set to off)
Detailed comments are provided in the SampleDB.targets file, and in the CustomDeployment\ScriptExecutorConfiguration.xml file.
When ready, open up a Visual Studio Command Prompt in the folder where the SampleDB.dbproj file is located, and run the following command line statement to extract your schema:
MSBuild SampleDB.dbproj /Target:TheExtract
Should you wish to deploy the database as part of the build process, change the target to /Target:DeployDatabase instead. Please note that the sample project is a trivial example, but most of the types of objects you will use are catered for.
The Targets File
The targets file effectively controls the build process. Here is a detailed explanation of the targets file supplied with the sample project:
<UsingTask TaskName="DeployDatabase" AssemblyFile="lib\SqlDeployment.Build.dll" />
<UsingTask TaskName="SQLExtract" AssemblyFile="lib\SqlDeployment.Build.dll" />
<UsingTask TaskName="SqlScriptExecutor" AssemblyFile="lib\SqlDeployment.Build.dll" />
SqlUsername = "BuildServiceUser"
SqlPassword = "BuildServicePassword"
<Target Name="DeployDatabase" DependsOnTargets="TheExtract">
SqlUsername = "BuildServiceUser"
SqlPassword = "BuildServicePassword"
SqlInstance = ".\SQL2005"
SqlUsername = "BuildServiceUser"
SqlPassword = "BuildServicePassword"
ProjectGUID = "$(ProjectGuid)"
BuildPath = "Build"
SchemaScripts = "@(Build->'%(FullPath)')"
StaticFiles = "@(CustomDeployment)"
SplitDropToSeperateFile = "1"
ExtractData = "1"
ExtractFromServer = ".\SQL2005"
ExtractUsername = "BuildServiceUser"
ExtractPassword = "BuildServicePassword"
ExtractFromDatabase = "SampleDBSource"
ExtractTargetFile = "$(MSBuildProjectDirectory)
ExtractCommandsFile = "$(MSBuildProjectDirectory)\ExtractCommands.txt"
ScriptPermissions = "1"
PermissionsTemplate = "$(MSBuildProjectDirectory)\PermissionsTemplate.xml"
PermissionsTargetFile = "$(MSBuildProjectDirectory)\Scripts\
AlterIfExists = "0"
ExistenceCheckerProvider = "AdamNachman.Build.SqlExtract.SchemaExistenceChecker,
TreatWarningsAsError = "1"
ExtractAssemblies = "1"
ExtractAssembliesConfig = "$(MSBuildProjectDirectory)\ExtractAssemblies.xml"
The targets model is flexible, so you could extract and publish in a single step by changing the targets to MSBuild SampleDB.dbproj /Target:TheExtract,DeployDatabase. My targets pattern is /Target:Build,TheExtract,RestoreDatabase,DeployToRestoredImage,PublishToShared,Checkin, where
DeployToRestoredImage executes the newly extracted script against the restored backup image from production,
PublishToShared updates the shared development environment (both of these are simply copies of
DeployDatabase with different parameters) and
Checkin adds the scripts to the source repository. Each step is only executed if the previous one is successful.
Using the Assemblies in your Project
Essentially, the entire process is controlled from within the MSBuild targets file. For those readers who are unfamiliar with the MSBuild process, you think of the targets file as a serialized repository for a class representing the configuration and execution sequence of a series of tasks. In the example project that I have provided, I have already edited the .proj file to include the custom MSBuild target.
The overall process is divided up into the following steps:
- Code extraction (
Deploying the database (
- Parse the project files, creating a class in memory per schema object
- Create a local, test database
- Execute the DLL against the local database, checking for dependencies to allow us to sequence the content correctly to eliminate warnings when we deploy
- Write the output .sql files to disk in the build folder
- Write the static files (custom deployment files) to the build folder
- Validate the database connection
- Read and sort ScriptExecutorConfiguration.xml
- Execute the scripts in sequence, reading them from the build path
Note that I have taken the decision to always drop and replace all Stored Procedures, views (indexed and otherwise), and functions with each deployment. This automates the recompile, and ensures that no incorrectly cached plans will be used. At the same time, it guarantees that the latest version of the code will be deployed into the production environment. Some may argue that this introduces the risk that if a change has been made in production, it will be overwritten and lost for all eternity. My argument is that if it doesn't exist in the source repository with the project in the first place, it doesn't exist anyway, so there is nothing to lose ... .
In a continuous integration environment, ideally, the target database for the deploy database step would be a daily refresh of a production image. This way, you will know daily whether or not you have broken something, and be able to address it early on in your project. Obviously, in an Enterprise environment, a backup of TBs of live data is impractical - here, you would rely on your DBAs to assist by providing enough sample data and the correct production schema. Also, given the fact that your project would be stored in a source repository (TFS, Subversion, etc.), any on-the-fly production changes that the DBAs make can be applied directly to the project files.
Releasing your Database Project with Updates
Adding New Schema
Assuming that the new schema objects can be added without modifying existing objects (for example, adding a new table, or new stored procedure), simply add the new object to the database project and let the extract process do the rest. As long as the file (e.g. TableFive.table.sql) is included in the project (and marked for BUILD in the file properties), it will automatically be added to the extracted script files, and wrapped with an existence check script.
In the event that you would like to update the schema, there are a number of options available to you. You can always add a custom script to the CustomDeployment folder and add a new sequence item in the ScriptExecutorConfiguration.xml configuration file. However, the AlterTables.sql file is included in the sample project and default .xml file, and can easily be used to extend your project. Let's assume you have a table
TableOne, and you want to add
Col5. The steps you would follow are:
- Change the
Create Table statement in the TableOne.table.sql to include the new column.
CREATE TABLE [dbo].[TableOne] (
[Col1] INT IDENTITY (1, 1) NOT NULL,
[Col2] INT NOT NULL,
[Col3] NVARCHAR (50) NOT NULL,
[Col4] VARCHAR (50) NOT NULL,
[Col5] INT NOT NULL
This will ensure that the column is included for any new databases.
- Add the default constraint file (new file DF_TableOne_Col5.defconst.sql) for the new column:
ALTER TABLE [dbo].[TableOne]
ADD CONSTRAINT [DF_TableOne_Col5] DEFAULT ((0)) FOR [Col5];
When the extract utility parses the script, it will create the existence check for this constraint automatically, so the script will be repeatable.
- Add the code to check for the existence of the column to the \Scripts\CustomDeployment\AlterTables.sql, like this:
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA =
'dbo' AND TABLE_NAME = 'TableOne')
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA=
'dbo' AND TABLE_NAME='TableOne' AND COLUMN_NAME='Col5')
ALTER TABLE dbo.TableOne
ADD [Col5] INT CONSTRAINT [DF_TableOne_Col5] DEFAULT (0) NOT NULL;
- If you have a look in the source, there is a project called "
DatabaseDeployment." This is a quick example of how you could write an application to process the generated script files and deploy them to a target database. In this manner, you could use the build process to generate the scripts, and then execute them at your customer. Note that there is ALSO a
OutputCompoundFile flag on the
DeployDatabase target. This instructs the automatic deployment task to generate a single output file appended in the correct sequence (this you could deploy to your customer if you prefer not to send multiple files - I've worked with a few DBAs that will only run the updates as single script files, and won't deploy using any application code other than Management Studio to run the scripts).
Extracting Static Data from a Source Database
Create a "source" database somewhere on disk, and use that to populate any other "default" reference data (lookup tables, etc). Change the
ExtractData flag to "
1", ensure that
ExtractFromDatabase are correct.
You already have the correct command to extract data from
TableOne in the ExtractCommands.txt file by default. If you wanted to add another reference table, you would simply add another line. Note that the "
merge" flag of one instructs the extract to insert if it does not exist and updates if already in place. To handle insert only, set
Novice (and even intermediate and advanced) users should not have a need to alter the extract and deployment code to fit their requirements. The possible exceptions here are when there is a new DDL type not covered by the code (or one that I have not seen fit to cover myself). By and large, the existing framework should cover 99% of your requirements.
The bottom line, however, is that the code will not automatically generate a set of "delta" scripts, but because of the way that it is generated IF you code your alter statements carefully you will be able to use the same script at new AND old customers, and upgrade pretty much ANY previous versions of the schema. The default sequencing of the script files (configured in Scripts\CustomDeployment\ScriptExecutorConfiguration.xml) will ensure that the upgrade is performed correctly. Note that the sequence is pretty much open ended, so there is nothing to stop you from inserting your own custom scripts at any point. Just ensure that the item key in the XML is unique and in the sequence position you desire (the deployment assembly orders them according to the key number just in case you insert it in the wrong place in the file) and you can extend it as much as you like.
The following items are explicitly not covered by this project:
- Files. While file groups are catered for, physical files are ignored. Currently, I view this as a DBA responsibility and not a development one, and file groups will be created without a physical file. The exception is the primary file on a new database (i.e., the Deploy simply creates a new database in the default path). This also applies to files for full-text indexes. However, there is nothing stopping you from modifying the code to allow this.
- SQL 2008 specific syntax. While I am sure that the vast majority of the SQL sources will be parseable, I am hesitant to say this is 2008 compliant, since I haven't tested it with any of the new types (spatial objects, etc.). However, since the actual content is intended to be parsed and compiled by Visual Studio, and my code simply applies a regex to the file header to determine the type, it is unlikely to be too much of a problem.
- Certificates, and encryption keys. Again, I see this as a DBA function (hell, they have to do something, right?). If there is a real need out there, I am happy to extend the system as needed.
- User defined types, and XML schema.
Items that I've considered adding, but never got around to:
- Finishing the SQL CLR process (mentioned above)
- Compiling the output into a resource assembly instead of simple text files, and then using this to "build version" a database
I've taken an agile approach to the development of this library, and while I don't believe that it's finished yet (will it ever be?), I think that it's time to get it into the public domain and be used by other developers. Functionality can always be added when it's needed.
If you run into trouble, post a response and I'll be happy to assist. If you would like to contribute, please do so - this is intended to benefit the community as a whole.
What If I Don't Use Visual Studio?
There are plenty of people that don't use Visual Studio, for a variety of reasons ranging from "disliking Microsoft" to "using free open source products". These are all valid. It must be noted that the extraction and deployment utilities are NOT bound to Visual Studio, but only the build task. This means that there is nothing stopping you from writing a wrapper that passes all the relevant parameters into these assemblies (e.g. a custom task for CruiseControl.NET, or simply an executable that passes in the parameters pointing to .sql files on disk).
To assist with this, let me explain the structure of the existing assemblies (all the C# source code is provided):
An assembly containing the common interfaces used across projects. Currently, just the
An assembly containing the
SerializableDictionary - the base object used by the
ScriptExecutorConfiguration class to store a dictionary of
KeyCaluePairs in a human readable/editable xml format on disk.
The assembly containing the logic to parse the .sql files on disk. This is the core utility that generates the .sql scripts.
The assembly that executes the SQL files, using the ScriptExecutorConfiguration.xml file as its input.
The assemblies that are called from within MSBuild. If you look closely at the properties on the build tasks, you will see that they inherit from the MSBuild Task class, e.g.:
public class SQLExtract : Microsoft.Build.Utilities.Task
The properties map directly to the attributes for the task in the XML, e.g.
public Microsoft.Build.Framework.ITaskItem ExtractData
The build tasks simply wrap the underlying
ScriptExecutor assemblies, passing on those properties when the
Execute method is called. So, for example, if you didn't use Visual Studio and simply stored your own .sql files on disk, you could write some code to call the
SQLExtract class directly, setting all the
public properties yourself, passing a
List<string> of the .sql files to it. Alternatively, a CruiseControl.NET task that does the same. Or you can simply call the existing MSBuild targets from TeamCity or TFS Build or command line - it's all up to you.
As previously explained, a sample application that executes the script files on disk in sequence. You could use this for deploying your databases in a production environment. Ideally suited to customers with little or no SQL skills, where you can control the deployment with your own application.
Supported Database Objects
- Indexed Views
- Indexes (Clustered and Non clustered)
- Service Broker Queues
- Fulltext Catalogs
- Fulltext Indexes
- Partition Functions
- Partition Scheme
- Stored Procedures
- Database Roles
- There may be more that I've inadvertantly catered for ... if you have any specific requirement, feel free to let me know.
For the record, I am currently using this on more than two dozen disparate database projects, spanning multiple servers with databases ranging in size from a couple of hundred MB to over 5 TB, all integrated into a build server running on TeamCity from Jetbrains. The key here is consistency. With a predictable and reliable build process, instant (or near instance) failure notifications and the involvement of the production DBA's in the design phase to minimize risk and encourage communication, we have successfully rolled out change after change in a highly volatile OLTP environment with a very unforgiving user base.
- Added support for Partition Schemes and functions
- Added new wrapper for ScriptExecutor to allow for separate task option (sample uses it for restoring a database baseline)
- Added support for checking for existing schema on extract (from file or db), and generating
ALTER statements for functions, stored procedures and views
Note: This was added since the existing model, while effective, fell down in a 24/7 environment. This is because the dropping and recreating of objects and assigning permissions at the end of the script would result in errors on a model supporting thousands of messages per second without the possibility of significant downtime. Creating the
ALTER statements instead allows for a more targeted deployment. To regenerated from scratch, simply set the value of this config option to "
- Added support for SQL authentication as well as Windows Authentication
- Corrected bug in
spGenerateMergeData procedure whereby data > 8000 characters was truncated on the extract
- Corrected bug in the extract where the generated reference files were copied to the target output folder BEFORE they were regenerated.
- Corrected errors on extract of indexes for secondary schema
- Added option to treat warnings as errors, which allows builds to fail on missing references
- On request, added schema as an option to the permissions map class, allowing for targeting schema with the permissions generator
- Migrated to VS 2010
- Extended example extract command to show use of more complex extract commands
- Note: With VS 2010, Microsoft's internal compiler now catches most issues with regards to missing variables (including correctly failing on case sensitive collation). Two outstanding issues that I know about include failing to detect an incorrect reference to the column of a table variable, and a call passing n-1 arguments to a stored procedure requiring n arguments is also parsed incorrectly (i.e. it succeeds when it should fail).
- On request, I added support for extracting the bitstream from an assembly and updating the assembly SQL file. This allows you to reference external assemblies and automatically extract the source and update your project file (and subsequent deployment).
Note: I am not automatically handling any dependencies on the assemblies, or auto-creating any functions or procedures, nor am I automatically generating an alter statement. This is because I simply don't want to go to the effort of interrogating the interface of the assembly to ensure that it is backward compatible so that SQL rebinds safely - this is something that should come out during test anyway. If you have an updated assembly, I suggest that you explicitly handle the dropping of the dependant functions/procedures/constraints in a "pre-execution" script yourself. You can simply create your own custom script, add it to the ScriptExecutor configuration file and give it a low sequence number to force it ahead of the other statements.
- Added a sample table valued function that uses SqlClr to parse a comma delimited list of integers and spit out a table variable. Wrapped with a T-SQL function called
GetIds, and can be used to efficiently parse a delimited input list and allow you to join to it in a stored procedure. This will outperform pretty much any T-SQL parsing implementation, in terms of CPU utilization and memory. Note that I haven't even tried to optimize it - there are faster ones out there. This is just a simple example of efficient use of CLR to perform a task that T-SQL can do, but does poorly.
- On request, added an option to the deploy target to prepend a using statement to the compound output file, especially to cater to DBAs that forget to select the correct database before execution ... *sigh*
- Added 70+ unit tests to the parser to prevent a breaking change like my last update from occurring.
- Improved overall performance of the parser (still using regex instead of a lexer, but since it works well as it is there is no reason to change).
- Compiled for .NET 4.0
- Corrected a bug in the extract script that resulted in the exclusion columns list being ignored
- Build 1.5.3963
- Corrected a bug introduced on extract from build source in existence checker
- Corrected a bug executing during the deployment phase
- Corrected "
use" statement when prepending to single file extract
- Added Visual Studio template - simply copy (DO NOT EXTRACT) the template zip file to My Documents\Visual Studio 2010\Templates\ProjectTemplates
Points of Interest
It's actually quite astounding to me just how many professional people seem to be blissfully unaware of the Database Edition for Visual Studio. Additionally, since Microsoft has (retrospectively) made this free with the Developer Edition since they are including it in VS 2010, the cost is no longer prohibitive to many of those in the know, but without the extra cash. Simply install SP1 on top of Developer, and then the GDR2 from Microsoft.