Introduction

Designing and creating a new database is a well-researched task but, in my experience, it is certainly not what database administrators and developers spend most of their time doing. It is much more common to be required to work on established databases, and on SQL code that has been by a previous developer or a colleague. In fact, it is not uncommon for a SQL Server DBA to manage up to 50 different SQL Servers, so the database schemas on each one can be considered largely "unknown" to them.

As a result, when database modifications are required, a DBA can spend a long time trying to decipher the scripts that are to be modified, and exploring the database schema that is to be altered.

This observation led me to design a new tool for Red Gate called SQL Refactor, which is an add-in for Management Studio in SQL 2005.

The tool is designed to help SQL Server developers and DBAs work much more productively and quickly with legacy code, help them improve the design and maintainability of database code, and to greatly reduce the time and effort involved in performing more complex database refactorings, such as object renaming or table splitting. Over the course of this article, I'll describe all of the major features of SQL Refactor:

Lay out SQL, which formats your SQL statements.
Script summary
Uppercase Keywords
Find unused variables and parameters
Qualify object names
Expand wildcards
Encapsulate selection as a new stored procedure
Smart rename a database object
Table split

SQL Refactor Features

SQL Refactor is a collection of utilities that are designed to aid the process of exploring legacy code, speed up schema changes and generally facilitate every day database development. Some of the utilities can be classified as refactoring utilities, which change the SQL code or the schema without affecting the behaviour of the database. Others are less intrusive and do not change the database schema or SQL statements at all.

During the design of SQL Refactor, I identified nearly forty potential features that could help to increase the productivity of the SQL Server developer or DBA. This list was pruned based on a relative value that I assigned to each one according to:

The amount of time the feature could save
The frequency with which it was likely to be used
The availability of alternatives

Cleary it does not make much sense to invest major development efforts into a feature that can be worked around by a simple copy paste operation and a few search and replace operations.

The SQL Refactor features that made the cut can be categorized in many ways. For the purpose of this article, I've grouped them as follows:

Understanding legacy code – this category of features are designed to improve the layout and legibility of SQL code so that its structure, general intent and major features can be readily understood. These features may change the formatting of code but are non-intrusive in terms of actual schema changes.
Improving code design – features that will make modifications to scripts to improve their design, maintainability and readability.
Major schema refactorings – complex refactorings that physically modify the database schema in a substantial way.

Understanding legacy code

These features will be used on a day-to-day basis. Those covered here are:

Lay out SQL, which formats your SQL statements.
Uppercase keywords
Script summary
Find unused variables and parameters

Laying out SQL

Database administrators and developers use a wide variety of tools that can display SQL statements. Most of these tools provide basic features such as displaying code in colour, and identifying. Laying out SQL is the next step.

A nicely formatted SQL script indents nested statements, and by doing this it allows fast identification of the various parts of the SQL code.

Depending on the size of the script, this feature can save a lot of time. I expect this to be the most popular and frequently-used feature in the tool. It is expected that whenever a statement is written or rewritten, it is indented, and persisted in the new layout.

Probably the biggest difficulty in providing a feature that formats code automatically is in trying to accommodate a wide range of different personal style preferences. Since there is no "standard" for laying out SQL, and since the SQL language is not as strictly structured as let's say C, people have developed a large number of conflicting conventions. Also, because there are only a few tools that do indentation and even these are limited in functionality, these conventions were rarely enforced. One of the challenges when developing SQL Refactor was identifying the various styles, and turning them into options. The first version of the product supports around 37 options. The large number of options allows a fine level of customizing, but demand for more options seems to be endless. During our beta program we had requests for over 80 options.

Keyword Capitalization

Formatting SQL is often complemented by a coding style that dictates that SQL keywords should be capitalized. This style helps to identify keywords, and is particularly useful in development environments that do not support color-coding. The capitalize keywords feature does exactly what it says. However, it has its limitations. Many people use dynamic SQL. Modifying the contents of strings, especially since most of the time these individual strings are not valid statements, is a challenge that is left for the future. Many beta testers were wondering why this feature is part of the lay out SQL feature. The primary reason is that some do not wish to lay out their SQL, but still want to capitalize keywords. All the features of the product should be considered as building blocks that can be used in sequence. They are easily accessible via shortcut keys, and thus this restriction is not imposing a significant overhead.

Script Summary

A feature that further aids developers to understand legacy code is script summary. This provides a high level overview of a script that also allows navigation. It is a hierarchical view of a script, i.e. if the scrip creates a few stored procedures, triggers, etc, it is possible to collapse the definition on less interesting objects. The information provided in the summary list is information that hides many details. For example in the case of select statements Script Summary lists the tables and table returning functions the select statement retrieves data from, but hides the way these tables are joined together, and hides subquery structure. A screenshot that helps to visualize this feature is in figure 1.

Find unused variables and parameters

A further feature that analyzes a SQL script is "find unused variables and parameters". Stored procedures as well as SQL code evolves, and since SQL is not compiled before it is persisted, garbage may remain in it. This certainly makes it more difficult to read and maintain. The described feature identifies unused variables and parameters, including variables and parameters that are assigned to, but the assigned value is never used.

Improving Code Design

Up to now we've dealt with features that, apart from adding white space, did not modify scripts or the database. It's time now to move on to slightly more intrusive features, which you can use to improve the design and maintainability of your scripts:

Qualify object names
Expand wildcards
Encapsulate selection as a new stored procedure

Qualify object names

This refactoring is the most straight-forward of this set of features. It changes object names to their fully qualified version. For example the Employees table becomes HumanResources.Employees. Similarly, un-aliased or unqualified column references are transformed into their qualified version.

Expand wildcard

Best practices dictate that one does not use wildcards in SELECT statements, and instead properly enumerates the columns that are to be selected. When a developer sees such a violation of this best practice, she can use our "Expand wildcard" feature, which rewrites the relevant part of the SELECT statement. This operation is most likely to save considerable time when the schema of the database the query references is not used frequently or the developer is not familiar with it.

Encapsulate as stored procedure

A very popular refactoring in refactoring tools that are written for programming languages such as C#, is the encapsulation of a piece of code into a new, reusable code unit, usually a procedure or method.

The equivalent for SQL code is provided by the "Encapsulate as stored procedure" refactoring. It turns a selection of code into a stored procedure, and replaces the original code segment with a call to this newly-created stored procedure. It identifies the variables in the selected SQL code, and intelligently decides on the type of parameter to which this variable needs to be changed. For example, if a variable is assigned before the selected block, its value is used or modified within the selection, and then queried in the code after the selection, the parameter is declared as an input/output parameter.

It is expected that this refactoring will be frequently used during the database development phase. Although this refactoring is reasonably complex, it is still relatively easy to do manually. However, automatic identification of the variables that should be turned into parameters will certainly save the developer considerable time.

Major schema refactorings

We now move on to the most complex refactorings – those that do physically modify the database schema in a substantial way:

Smart database object rename
Table splitting

These features will be required on a relatively infrequent basis, but when they are required they can potentially save you a vast amount of time.

Smart object rename

A more intrusive refactoring, from perspective of the database schema, is smart object rename. Indeed, technically it is not even a single refactoring, but a whole group of refactorings. This refactoring group produces a script that renames a stored procedure, function, view, table or their columns or parameters. This rename operation also modifies the referencing database objects in a consistent order.

This refactoring, while expected to be rarely used, can save a large amount of time. Dependencies are difficult to map in a database management system. Finding out which views reference a particular table or view may require reading through the definitions of all the database objects. If a mistake is made and, for example, a view is not updated properly to reference the renamed object, the error might not be spotted for a long period, when the broken object needs to be used.

The smart object rename refactoring identifies all of the textual database objects, and alters them to use the modified name. The modification script considers the cascading effect of the refactoring, and orders the database operations accordingly. Since it provides a script as its output, this script can be further modified to include extra business logic. If this business logic does not affect the dependencies of the modified objects, the script will execute successfully.

Table Split

By far the most complicated refactoring in SQL Refactor's feature list is "Table Split". Its inclusion in the tool was motivated by two common problems:

Missing referential integrity tables
Post-deployment database re-design

In the former case, a domain restriction needs to be added. The latter problem is more complex. Even with well designed databases, it's possible that requirements change, and the database schema needs to be adjusted even after it has been used in production. For example, a bank merge may change the relationship of entities that were mapped to a singe table, and this relation now must be represented using two tables.

The problem is that during the database design phase there is no concept of data and referencing textual objects like views, stored procedures, etc. Once a table is populated with real word data, it is very costly to enter the same data into a new schema structure. The table split refactoring tries to assist such situations. It allows splitting a table into two, and it will rewrite the dependent textual objects in the database. Based on the shared columns, it will set up a primary key on the second table, and then it will automatically set up a foreign key relationship between that and the new table and use it to replace the original table reference with a join statement. It will intelligently decide on use of INNER and OUTER joins, based on the nullability of the shared columns.

The table split refactoring is not expected to be used very frequently, and it has limitations. However, the amount of time it can save by creating an initial script to achieve a table split makes it a valuable refactoring.

Summary

SQL Refactor is a tool that I hope will allow SQL Server devbeloper and DBAs to make significant changes to their SQL code and database objects, with zero impact on the behavior of the database or the applications that use it.

The refactorings and utilities that are all available in SQL Refactor were selected as the most valuable from a large list of potential refactorings. If you have any feedback on these refactorings, or have suggestions for new ones in future versions of the tool, then please do contact me by email or via the Red Gate message board.

In the future you should see a few more articles that expand more on the various refactorings, and describe them individually in much more detail.