Parse Transact SQL to Check Syntax

Andre Pieterse

4.37/5 (13 votes)

Jun 25, 2012

CPOL

13 min read

89698

2225

How to implement a parse function within a C# WPF application

Download source - 26.7 KB

Introduction

The problem that I am going to address in this article is how to pass some Transact-SQL text to a specified SQL Server instance, and ask it to parse the code, returning any syntax errors. This would be useful in a case where your application allows the user to enter some T-SQL text to execute, or where T-SQL gets executed dynamically from script files, or whatever. The possibilities are endless, just bear in mind the security implications that this might have if this article inspires you to implement such a design.

Say, for example, you have a system which enables one user (with special access privileges, of course) to write T-SQL code and store it on the system in the form of scripts (in the database or in files). Then, another user of the system would come in and choose one of these scripts based on the name and description provided by the programmer, and then click a button to execute it. Obviously, you need some mechanism to check the validity of the code before allowing it to be stored on the system. This is where my solution would hopefully be useful.

Background

I did some research and decided to include some background on the inner workings of SQL Server, or any other DBMS for that matter. So what really happens when your applications execute queries on the database? Is there a specific process that the DBMS follows to return the requested data, or to update or delete a subset of data? What happens under the hood of your preferred DBMS is quite complicated, and I will only explain or mention key processes on a high level.

SQL Server is split into multiple components, and most of these components are grouped to form the Relational Engine and the Storage Engine. The Relational Engine is responsible for receiving, checking and compiling the code, and for managing the execution process, while the Storage Engine is responsible for retrieving, inserting, updating or deleting the underlying data in the database files. The component that I want to touch base with is the Query Processor, which is part of the Relational Engine.

As the name suggests, it is the Query Processor's job to prepare submitted SQL statements before it can be executed by the server. The Query Processor will go through three processes before it can provide an Execution Plan. This execution plan is the most optimal route chosen by the DBMS for servicing the query. The three processes mentioned include:

Parse and Normalize
Compile
Optimize

The parser checks for syntax errors including correct spelling of keywords. The normalizer performs binding, which involves checking if the specified tables and columns exist, gathering meta data about the specified tables and columns, and performing some syntax optimizations. Programmers frequently use the term Compilation to refer to the compilation and optimization process. True compilation only affects special T-SQL statements such as variable declarations and assignments, loops, conditional processing, etc. These statements provide functionality to SQL code, but they do not form part of DML statements such as SELECT, INSERT, UPDATE or DELETE. On the other hand, only these DML statements need to be optimized. Optimization is, by far, the most complex process of the Query Processor. It employs an array of algorithms to first gather a sample of suitable execution plans, and then filters through them until the best candidate is chosen.

After the optimal execution plan is determined and returned by the Query Processor, it is stored in a cache. SQL Server will automatically determine how long to keep this execution plan within the cache as it might get reused often. When an application executes a query, SQL Server checks if an execution plan exists in the cache for the query. SQL Server generates a cache key based on the query text, and searches for the same key in the cache. Queries need to be recompiled and reoptimized when metadata changes such as column definitions or indexes, but not for changes in parameters, system memory, data in the data cache, etc.

Finally, the Query Processor communicates the execution plan to the Storage Engine and the query is executed.

Using the Code

I have created and included a simple code editor application, but keep in mind that the main purpose of this article is to provide you with a parse function, and only code snippets and notes revolving around this point will be covered here. There are lots of very useful articles out there for building WPF applications. I will assume that you have some experience with Visual Studio and C#. I have included the example app which was written in Visual C# Express 2010 as a WPF Application. Knowledge of WPF is not required as I will explain the relevant C# code in detail.

Essentially, I want my application to have an Execute button and a Parse button (like in MS SQL Server Management Studio). Pressing the Execute button, SQL Server will go through the whole process as described above to first prepare the statements and then determine the execution plan before it will be executed. For the parse button, naturally it should only parse the query. I am going to create a class that will encapsulate all my ADO.NET objects, and provide methods, Execute and Parse for wiring functionality to the buttons. I would also provide methods for connecting and disconnecting to the SQL Server instance with a specified connection string. The class is called SqlHandler.

This class encapsulates and hides the following objects:

SqlConnection conn
SqlCommand cmd
SqlDataAdapter adapter
List<SqlError> errors

You need to include using System.Data.SqlClient; and using System.Data; to the using directives list at the top of the code file, as I am sure you know. The conn object is used for connecting to the database. The ConnectionString property directly gets and sets the conn.ConnectionString property. This allows you to get or set the Connection string from outside the class. The cmd object is used to execute commands, and adapter is used to obtain query results from the database. The object errors is a generic list of type SqlError. This list will be used to capture and return errors generated while executing or parsing T-SQL code.

Property FireInfoMessageEventOnUserErrors and Event InfoMessage

Most of you reading this article would already be familiar with these ADO.NET classes. Most of the time, I am developing applications with ADO.NET; I only use a few selected properties and methods. The SqlConnection class contains a FireInfoMessageEventOnUserErrors property and an InfoMessage event that are less well known and less often used (in my opinion at least). I had to discover them myself by digging through the objects as I could not find a relevant article explaining how to accomplish what I wanted. Eventually, through trial and error, I got a working solution.

FireInfoMessageEventOnUserErrors is a boolean property. When set to false (default), the InfoMessage event will not be fired when an error occurs, and an Exception will be raised by the ADO.NET API. When set to true, an Exception will not be thrown, but the InfoMessage event will be fired. For my code to work, I had to enable this event to catch all the messages through the SqlInfoMessageEventArgs event argument object. The following code snippet shows how to set this property and event in the constructor:

conn.FireInfoMessageEventOnUserErrors = true;
conn.InfoMessage += new SqlInfoMessageEventHandler(conn_InfoMessage);

conn_InfoMessage is the name of the event handler method that will be called when the event fires. It is important to note that although this looks like an asynchronous operation, it is in fact synchronous. This means that when the T-SQL query is executed by passing it to cmd.ExecuteNonQuery or to adapter.Fill, the event will be fired before continuing execution. This allows us to suck up all the messages into the errors list before returning from the Execute and Parse methods of our class where ExecuteNonQuery and Fill is called. The snippet below describes how the messages are caught in the event handler.

private void conn_InfoMessage(object sender, SqlInfoMessageEventArgs e)
{
	//ensure that all errors are caught
	SqlError[] errorsFound = new SqlError[e.Errors.Count];
	e.Errors.CopyTo(errorsFound, 0);
	errors.AddRange(errorsFound);
}

It is important to mention that the event will be fired for every error that the T-SQL script might contain. For instance, if your script contains two errors, the conn_InfoMessage event handler will be called twice! I only discovered this while testing my application where I tried to parse a script containing multiple errors. The initial result was that my Parse method always returned only one error, while SSMS reported the correct amount of errors for the same script. Only when I inserted a message box in the event handler, I discovered how it works. The reason why this was misleading is because the second argument of our event handler, the e object, which is of type SqlInfoMessageEventArgs has an Errors property. This property is of type SqlErrorCollection, which to me implied that it contains multiple SqlError objects. Naturally, I assumed that this collection will contain all the errors at once. After a few code modifications, I got the desired result. What happens now is that every time the event is fired, an SqlError array is created and the e.Errors collection of SqlError objects will be copied to this array. Even though this collection contained exactly one item every time I have tested my code, I make sure that all the SqlError objects are captured just to be safe. This whole array is then copied to the errors list, which is a private field within my class definition. This list is used to aggregate all the errors before returning it to the client code. Another point worth mentioning is that the errors list has to be cleared every time Parse or Execute is called.

Execute Method

The first parameter of this method, sqlText contains the T-SQL code to be executed. The second parameter is an SqlError array object. Take notice of the out keyword. This means that the parameter is an out parameter, and we have to set its value somewhere in the method. This allows the method to return both a DataTable object (through the normal return type and return statement), and an array containing our SqlError objects. The client code will be responsible for checking the length of the array to determine if any errors were generated.

public DataTable Execute(string sqlText, out SqlError[] errorsArray)
{
	if (!IsConnected)
		throw new InvalidOperationException
        ("Can not execute Sql query while the connection is closed!");

	errors.Clear();
	cmd.CommandText = sqlText;
	DataTable tbl = new DataTable();
	adapter.Fill(tbl);
	errorsArray = errors.ToArray();
	return tbl;
}

First, we need to tests whether the connection is open or not using the IsConnected property, and throw an exception if it is not. Next, the errors list is cleared to prevent reporting errors previously encountered. The query is then executed using adapter.Fill(tbl) where tbl is a reference to a new DataTable object. This table will be filled with data if the T-SQL code returns any data. As mentioned earlier, the InfoMessage event will be raised synchronously, so the next line after calling Fill will only be executed after all errors were raised through the event. All errors (if any) are copied to a new array of SqlError objects. This array is assigned to the out parameter errorsArray, allowing the client of our class to check if any errors were encountered. Remember that no exceptions will be thrown when you set FireInfoMessageEventOnUserErrors to true.

Parse Method

This method accepts one parameter, sqlText which contains the T-SQL code to be parsed. It returns an array containing SqlError objects. The client code should test the length of this array to determine if any errors were generated.

public SqlError[] Parse(string sqlText)
{
	if (!IsConnected)
		throw new InvalidOperationException
        ("Can not parse Sql query while the connection is closed!");

	errors.Clear();
	cmd.CommandText = "SET PARSEONLY ON";
	cmd.ExecuteNonQuery();

	cmd.CommandText = sqlText;
	cmd.ExecuteNonQuery(); //conn_InfoMessage is invoked for every error, 
                           //e.g. 2 times for 2 errors
	
	cmd.CommandText = "SET PARSEONLY OFF";
	cmd.ExecuteNonQuery();

	return errors.ToArray();
}

Again, we throw an exception if the connection is not open, and we clear the errors list. SQL Server has an option "PARSEONLY" that we will use to prevent further processing of our T-SQL code beyond the parse phase. Before our sqlText string is executed, the PARSEONLY option is set to ON. Afterwards, it is set back to OFF. There is a potential pitfall here: what if the client code is a console-type application, and the user executed the command SET PARSEONLY ON to explicitly prevent further execution beyond the parse phase. When the client code then calls the Parse method, PARSEONLY will be set back to OFF before the method returns, without the user's knowledge. Workarounds for this problem will not be explored further in this article, because the implementation will differ as per requirements of the project.

Additional Notes on the Class

The ConnectionString property of our SqlHandler class "forwards" the ConnectionString property on the SqlConnection object that it encapsulates. In the constructor, the ConnectionString is initialized to a "template" connection string. You have to manually insert the Data Source and Initial Catalog values in the string. The Connect method accepts a string argument containing a connection string. This connection string will replace the existing connection string on the SqlConnection object.

Sample Application

My sample project contains the SqlHandler class, and a small test application. The application provides some basic text editor functionality such as opening files, saving files, cut, copy and paste. Furthermore, it implements the SqlHandler object's methods to enable connecting and disconnecting from a SQL Server instance, and executing and parsing SQL code. The layout of the main window was designed to be familiar looking, with the menu and toolbar at the top, the text area in the middle, and an error grid and status bar at the bottom. When you build and run the application, a Connection dialog window will pop up. On this window, you have to enter a valid connection string to connect to a SQL Server instance. Keep in mind that this application is not multi-threaded. As a result, entering a bad connection string will cause the interface to "hang" while the connection times out and eventually returns with an error message.

I have created a region in the SqlHandler class for housing custom RoutedUICommand objects for binding my own commands to the user interface. I put them in their own separate region because they have nothing to do with the rest of the class. These command objects are all static, and the class also defines a static constructor for initializing them. These commands could also have been placed in a separate class.

Type your T-SQL text in the text area in the middle of the window. To parse the code, press the Parse button, or press the F6 key on the keyboard. To execute the code, press the Execute button, or press F5 on the keyboard. Both the Parse and Execute functions will report errors in the errors grid at the bottom of the application. The errors grid is nested within an expander which will pop up automatically when errors are generated. When you execute a query that returns a result set, a result viewer window will appear. Parsing and executing will be disabled when the application is not connected to a SQL Server instance, as defined by the command bindings.

When you parse a query that references invalid database objects such as tables or columns that does not exist, no errors will be returned. Remember from the Background section that Parsing does not include Binding.

Compliments to the author of the icons set which can be downloaded here for free.

Points of Interest

Visual Studio has some nifty little tools that can make your life easier. One of them is the tool that inserts appropriate code snippets where it is expected by pressing the Tab key. This is useful, for example, when you are registering the InfoMessage event. Type the following line of code: conn.InfoMessage +=. You should see a little pop up box...

Press Tab once and it will complete the line for you based on the required delegate for the event. Press Tab again and it will generate the event handler method for you. The event handler will already be set up to contain the correct arguments, all you have to do is add your code.

References

SQL Server Pro, 23/10/1999, Inside SQL Server: Parse, Compile, and Optimize [online] Available at: http://www.sqlmag.com/article/tsql3/inside-sql-server-parse-compile-and-optimize [Accessed on 20^th June, 2012]

History

25^th June, 2012: Initial version
13^th January, 2022: Article updated