Click here to Skip to main content
11,581,186 members (62,675 online)
Click here to Skip to main content

Use SQL Parameters to Overcome Ad Hoc Performance Issues

, 12 Jan 2007 156.5K 407 109
Rate this:
Please Sign up or sign in to vote.
Defining (rather loosely) ad hoc queries as SQL commands built as strings by an SQL client application and submitted to SQL Server.

Background

I work for a small web company which over the last year or two has experienced a large amount of growth in both terms of staff as well as web site traffic. As part of that growth we have been forced to spend a large amount of time focusing on our database. And since we lack a DBA, that stewardship has fallen to me.

We now serve between 1 and 2 million page views daily and over the last year it has come to my attention that the biggest load on our database in terms of CPU and memory was not so much due poor performing queries (although there have been some) but rather poorly formatted queries.

I have noticed as I read articles and also as I look at the code run by commercial applications (more on this later) that ad hoc queries are used quite a bit. This is all despite the numerous warnings everywhere regarding SQL injection attacks as well as the performance benefits of using stored procedures.

For the purpose of this article I am defining (rather loosely) ad hoc queries as SQL commands built as strings by an SQL client application and submitted to SQL server. This in essence is rather encompassing and includes pretty much everything but stored procedure commands. Poorly written stored procedures can be a drain on CPU and memory for reasons other than poorly tuned SQL statements, but those reasons are beyond the scope of this article. I intend to focus on how failing to use parameters in SQL statements can be a huge drain on resources and can cripple even the most powerful hardware.

For clarification, the context of this article is that of a large number of users executing similar, single statements concurrently as opposed to a single batch of statments within a single session (connection).

SQL Server Internals

Like any programming language, every query submitted to SQL Server must be parsed and complied before it can be run. From a high level, this involves checking syntax as well as verifying the existence of objects used in the query. Once this is completed SQL Server now has a structure which it understands and can use to retrieve the data requested. For efficiency, SQL Server stores this structure in memory (Procedure Cache) for reuse. Now, the next time this query gets submitted, SQL Server can skip the parse and compile steps.

We can see this in action in the following code sample:

SET STATISTICS TIME ON
SET STATISTICS IO OFF

USE AdventureWorks
go

DBCC FREEPROCCACHE
go


SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Miller'

WARNING - The use of the command DBCC FREEPROCCACHE is not recommended in production systems. Doing so may cause SQL server to perform poorly until the procedure cache can be rebuilt. It's purpose here in this article is to demonstrate how SQL server uses the procedure cache to store and reuse ad hoc query statements.

If you view the "Messages" pane in SQL Server Management Studio (SSMS) you will find the following:

SQL Server parse and compile time: 
   CPU time = 78 ms, elapsed time = 106 ms.

(1 row(s) affected)

Among other things, just before the rows affected message, you will see the parse and compile time. These times will vary and this is obviously not a complex query so don't expect to see anything significant. I am running SQL Server 2005 x64 on Windows XP x64 with an AMD Athlon 64 x2 4200+ with 2GB of RAM. If you run only the SELECT statement again what you will most likely see is this:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 9 ms.

(1 row(s) affected)

The reason for the drop in CPU and elapsed time is due to the fact that the second time the statement was run (without clearing the procedure cache buffer) the statement was retrieved from cache. To see this in action create the following trace:

/****************************************************/
/* Created by: SQL Server Profiler 2005             */
/* Date: 12/30/2006  10:01:02 AM                    */
/****************************************************/


-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 5 

-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc 
-- extension will be appended to the filename automatically. If you are 
-- writing from remote server to local drive, please use UNC path and make 
-- sure server has write access to your network share

exec @rc = sp_trace_create @TraceID output,0,N'C:\ProcedureCache',
                           @maxfilesize,NULL 
if (@rc != 0) goto error

-- Client side File and Table cannot be scripted

-- Set the events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 38, 1, @on
exec sp_trace_setevent @TraceID, 38, 21, @on
exec sp_trace_setevent @TraceID, 38, 10, @on
exec sp_trace_setevent @TraceID, 38, 35, @on
exec sp_trace_setevent @TraceID, 38, 12, @on
exec sp_trace_setevent @TraceID, 35, 1, @on
exec sp_trace_setevent @TraceID, 35, 10, @on
exec sp_trace_setevent @TraceID, 35, 35, @on
exec sp_trace_setevent @TraceID, 35, 12, @on


-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint

exec sp_trace_setfilter @TraceID, 35, 1, 6, N'AdventureWorks'
-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1

-- display trace id for future references
select TraceID=@TraceID
goto finish

error: 
select ErrorCode=@rc

finish: 
go
    

When you run the script, make sure to copy the TraceID result so you can stop the trace when you're done by using the following command:

--stop trace job
exec sp_trace_setstatus {PUT TRACEID HERE}
      , 0

After creating the trace, follow the first two steps again. Then open your trace file (C:\ProcedureCache.trc if you are using the defaults from this script) and you will see several trace events. Among them, you will see an SP:CacheInsert event for the SELECT statement with an entry further down for the SP:CacheHit event for the same SELECT statement. These events demonstrate exactly what I just described above.

Ad Hoc SQL

Ad Hoc SQL is very flexible and often more familiar to developers than writing stored procedures because it allows developers to simply construct a string (usually via contatenation) and submit it to the database. Ad hoc can make writing Query By Form (QBF) processes very simple as user input directs which tables, joins and parameters are needed to satisfy the request. The resulting SQL statement can now look like this:

SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Smith'

If you run this statement again in SSMS and look at the parse and compile time you'll see the statement was recompiled. Also, if you're still running a trace, you'll see the statement was added to the procedure cache instead of being reused. This shows that unless the query (including the values of the where clause) is exactly the same, SQL server must recompile the statement. On a small database, with little traffic this will mostly go unnoticed. But on a either a larger system or one with a significant amount of traffic your procedure cache will bloat. A side effect of this is that the data in your buffer cache will be pushed out resulting in more data being read directly from disk instead of from memory. This will cause a serious IO bottleneck and most likely cripple your system.

Several years ago we purchased an ASP.Net application for ad management. The database is less than 20mb in size, but because of the traffic (100 - 200 batches/sec) it was able to bring a 3GHz Xeon with 2gb of RAM to it's knees on a server without any other databases installed. When we investigated the issue we found that SQL server was using all 2gb of memory and the CPU was stuck at 100%.

Below is an example of how our query statement above might look from a .Net client:

string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
             "Application Name=AdHocClient;Server=(local);" +
             "Database=AdventureWorks;Trusted_Connection=Yes;"))
{
    StringBuilder sql = new StringBuilder();
    sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress,");
    sql.Append("C.Phone, E.*");
    sql.Append(" FROM Person.Contact C");
    sql.Append(" INNER JOIN HumanResources.Employee E ");
    sql.Append(" ON E.ContactID = C.ContactID");
    sql.AppendFormat(" WHERE C.LastName = '{0}'", eName);
    
    SqlCommand cmd = new SqlCommand();
    cmd.Connection = conn;
    cmd.CommandText = sql.ToString();
    cmd.CommandType = CommandType.Text;
    
    try
    {
        using(SqlDataAdapter da = new SqlDataAdapter(cmd))
        {
            da.Fill(data);                        
        }
    }
    catch(SqlException ex)
    {
        //.... execption code .....
    }
}

The variable eName represents a value retrieved from user input. Each time the process is run with different input the statement is compiled and cached. This causes the condition I just described eventually overwhelming the database server. So how do we resolve the issue, with the least amount of changes required?

Parameterized SQL

Stored procedures are the recommended method for database access from a client application - for security and performance reasons. But if you have already written your application and don't have the time to move all your data access to stored procedures the alternative is to parameterize your SQL statements. When you define parameters your statements are the same, no matter what the value being passed. So if we were to take the above process and rewrite it as follows:

string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
       "Application Name=ParameterizedClient;Server=(local);" +
       "Database=AdventureWorks;Trusted_Connection=Yes;"))
{
    StringBuilder sql = new StringBuilder();
    sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, ");
    sql.Append("C.Phone, E.*");
    sql.Append(" FROM Person.Contact C");
    sql.Append(" INNER JOIN HumanResources.Employee E ");
    sql.Append(" ON E.ContactID = C.ContactID");
    
    //replace contatenation of variable with parameter name
    sql.Append(" WHERE C.LastName = @LastName");
    
    SqlCommand cmd = new SqlCommand();
    cmd.Connection = conn;
    cmd.CommandText = sql.ToString();
    cmd.CommandType = CommandType.Text;
    
    //Define SqlParameter object and assign value to the parameter
    cmd.Parameters.Add("@LastName", SqlDbType.VarChar, 50);
    cmd.Parameters["@LastName"].Value = eName;
    
    try
    {
        using(SqlDataAdapter da = new SqlDataAdapter(cmd))
        {
            da.Fill(data);                        
        }
    }
    catch(SqlException ex)
    {
        //.... execption code .....
    }
}

Now when your client application runs this process SQL server will always use the compiled version in the procedure cache which prevents the cache from bloating and allowing more of your system memory to be used for the Buffer cache which will also improve your system performance and application scaleablility.

SqlCommand.Prepare()

A member asked weither it would be more efficient to use the Prepare() method of the SqlCommand object. In the context of this article the answer is, no. The Prepare() method calls sp_prepexec, an undocumented stored procedure. Essentially, it creates a temporary stored procedure for the current session which is destroyed when the current connection is closed. This allows the query to be called as if it were a stored procedure until the connection is closed. The benefit of using Prepare() is that it reduces network traffic because the entire command text as defined in the client application is sent across the network only once. The first time the command is executed after calling the Prepare() method, it looks something like this:

declare @p1 int
set @p1=10001
exec sp_prepexec @p1 output,N'SELECT C.Title, C.FirstName, C.LastName, 
      C.EmailAddress, C.Phone,E.*
    FROM Person.Contact C
    INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
    WHERE C.LastName = @LastName
 ',@EmployeeID=15073
select @p1                
            

Notice how it declares @p1 and returns it as a scalar value along with the results of the command. The SqlCommand object then can send the following command on subsequent requests:

exec sp_execute 10001,@EmployeeID=12991

The reason it is not more efficient to use the Prepare() method in this scenario is because, unless the same connection object is used and remains open, the request sent to Sql Server will always be the call to sp_prepexec. So this adds the overhead of creating a temporary stored procedure along with a handle to that stored procedure, but you don't get the benefit of the reduced traffic. It is unlikely that this overhead will create a large impact on SQL Server however, but I recommend omitting the call to Prepare() unless it's needed.

The Prepare() method will shine when the SqlCommand object will be called repeatedly during the same session (usually within a loop). When compared to not calling Prepare(), it was much faster. I will not be providing any benchmark comparisons here. I leave that to you, the reader. But in the sample project I have included a method which will allow you to do some of your own testing.

Sample Project

The sample project included with this article runs a stored procedure taken from the AdventureWorks database and passes the entire text of the stored procedure to a command object as text to simulate a more complex query with a single parameter. There are two versions of the query, an Ad Hoc version and a parameterized version. Each one is run 10,000 times with a randomized integer as the parameter.

I recommend running perfmon while the sample project is running. I used the counters Process.% Processor Time - sqlserver, SQL Server:SQL Statistics.Batch Requests/Sec and SQL Server:Plan Cache - Cache Pages. Here were my results:

Run Time CPU Req/Sec Cache Pages - Start Cache Pages - End
Ad Hoc 115 SEC 95% 109 600 43684
Parameterized 10 SEC 59% 1153 50 260

As you can see, without parameters the process takes 10 times longer to run, the CPU runs almost 2 times higher and the number of 8k pages used by the procedure cache is 200 times more. And the last counter (8k pages) will continue to get worse the longer the application runs because when the query is parameterized it will only add one version of the plan to the cache. Conversely, the ad hoc version will add a new version of the plan for every unique variable passed to it.

Conclusion

While stored procedures are the recommended method for data access clients, it isn't always practical to rewrite everything if your code is already using Ad Hoc SQL. However, you can get almost the same performance benefit from using parameters that you can get from stored procedures. So if you haven't done so there really shouldn't be any excuse for not parameterizing your queries. Your application will scale better and will be more stable.

Revision History

1/10/2007

  • Added clarification of article's context
  • Added SqlCommand.Prepare() section
  • Added timers to sample project
  • Removed use of concatenation from sample project and sample code in article
  • Replaced SqlDataAdapter.Fill() with SqlCommand.ExecuteReader() in sample project
  • Added methods to sample project to demonstrate using SqlCommand.Prepare()

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Mark J. Miller
Web Developer STG Utah
United States United States
Mark is a jock turned geek who has been programming since early 2000 where he stumbled on excel macros and dug down to discover VBA. Since 2001 he has been working for MaxPreps.com where he can be a geek who writes web applications for jocks. He has been using C# and ASP.Net since 2002.

Check out Mark's blog: www.developMENTALmadness.com

View Mark Miller's profile on LinkedIn

You may also be interested in...

Comments and Discussions

 
GeneralRe: Please advise, a little bit confuse here Pin
lok_tan10-Jan-07 5:24
memberlok_tan10-Jan-07 5:24 
GeneralGood article - sp_executesql is your friend... Pin
MR_SAM_PIPER8-Jan-07 8:04
memberMR_SAM_PIPER8-Jan-07 8:04 
GeneralSame as the DataSet code generator Pin
eligazit7-Jan-07 20:08
membereligazit7-Jan-07 20:08 
GeneralExtremely useful and valuable information Pin
owen6543217-Jan-07 17:00
memberowen6543217-Jan-07 17:00 
GeneralRe: Extremely useful and valuable information Pin
Mark J. Miller8-Jan-07 4:03
memberMark J. Miller8-Jan-07 4:03 
GeneralExcellent, and just when I was going to flame you... Pin
Dewey3-Jan-07 20:49
memberDewey3-Jan-07 20:49 
GeneralRe: Excellent, and just when I was going to flame you... Pin
tag200115-Jan-07 18:11
membertag200115-Jan-07 18:11 
GeneralRe: Excellent, and just when I was going to flame you... Pin
Mark J. Miller16-Jan-07 3:24
memberMark J. Miller16-Jan-07 3:24 
tag2001 wrote:
It's my understanding that as of SQL 2000 and up, a fully parameterized query will be cached and just as fast as a stored procedure


Do you have proof to back it up, or are you blindly suggesting this?

My statement was neither blind nor misguided. It appears that there are some in this community that have it in for stored procedures. However, I don't understand the reason why they feel the need to attack whenever the subject is brought up.

What I said was that they were the "recommended method for data access from a client application". I also mentioned they are not the best for every scenario and that there are cases where a poorly written stored procedure can cause performance problems as well. So I did qualify my statements about stored procedures.However, to clarify and reaffirm, stored procedures are not the best practice, nor are they the only best practice, but they are the best practice for most scenarios.

tag2001 wrote:
a fully parameterized query will be cached


Did you read the article? This is exactly the point of the article, that there is no need to go chicken little and rewrite your SQL code as stored procedures.

However, before I launch into my rebuttal let me say something to those who will not read my entire post because they are blinded by their prejudices against stored procedures. It is true that stored procedures are not for use in every scenario. This is the case with all 'recommeneded' best practices. It is up to the developers/architects of a solution to determine what works best in the current scenario. With knowledge and experience as a guide they can decide between the trade-offs. Performance is not always the most important factor in these decisions, but it is important nonetheless.

As for my rebuttal:

First off, since your comment was pointed at performance, all you need to do to find proof that stored procedures perform better than parameterized queries is run my sample project. The code that runs SqlCommand.Prepare() is just as fast as parameterized queries. Then if you run the code which opens a connection once and then repeatedly calls the same query you'll see that the code that runs SqlCommand.Prepare() runs better than parameterized sql. You could run the same test w/ a stored procedure and get the same results. This shows that while your first statement is true, yes parameterized sql is cached just as fast as stored procedures, your second statement is not implied by the first.

I have seen the claim that there is no reason to use stored procedures because there is no noticeable difference for users. This may be true if you're the only one connected to the database executing queries. However, when your application is submitting hundreds of batches per second there begins to be a noticable difference. This is because even though your client application may not be working hard, ad hoc and parameterized queries require more data to be sent accross your network to make the same request as a stored procedure or prepared command. So just because you can't percieve a difference your network and database administrators will certainly notice.

Second, stored procedures can be changed w/o having to recompile and redistribute your client application. If your application is a web application or web service, then under the best circumstances when you update your application code you force your users to wait while the code is reloaded by the appdomain. But in some cases you also risk forcing a full appdomain reload, which causes you to loose everything you may have cached for performance.

Third, by using stored procedures you don't have to grant users direct access to your tables, only you the developer require access to the tables to write the stored procedures. Your users only need access to the stored procedures themselves. SQL injection attacks are not the only threat to security.

So performance, manageability and security are some of the reasons I recommend stored procedures. Now, lest I be accused of making 'blind' statements, here are some reasons not to use stored procedures:

If your stored procedure uses dynamic sql you will use more memory in the procedure cache because the full text of the stored proc is cached in the procedure cache. There will be multiple copies of your stored procedure in the procdure cache due to the different execution plans for each variation in the dynamic query. The fact that it is a stored procedure may give a false sense of security against sql injection attacks.

Also, if you have a lenghty stored procedure and two users execute it concurrently it may get cached twice in the procedure cache. Either use sp_execute (parameterized sql) from your client or break your stored procedure into smaller ones.

Feel free to suggest other reasons why stored procedures aren't always the best choice, I'll most likely agree with you. I understand the tradeoffs, but your accusation that my statements were blind or misguided were not only false, but show that you did not read the article.
GeneralRe: Excellent, and just when I was going to flame you... Pin
tag200116-Jan-07 14:32
membertag200116-Jan-07 14:32 
GeneralRe: Excellent, and just when I was going to flame you... Pin
Mark J. Miller16-Jan-07 15:24
memberMark J. Miller16-Jan-07 15:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150603.1 | Last Updated 12 Jan 2007
Article Copyright 2007 by Mark J. Miller
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid