|
|||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||
|
Announcements
Services
Chapters
Feature Zones
|
BackgroundI work for a small web company which over the last year or two has experienced a large amount of growth in both terms of staff as well as web site traffic. As part of that growth we have been forced to spend a large amount of time focusing on our database. And since we lack a DBA, that stewardship has fallen to me. We now serve between 1 and 2 million page views daily and over the last year it has come to my attention that the biggest load on our database in terms of CPU and memory was not so much due poor performing queries (although there have been some) but rather poorly formatted queries. I have noticed as I read articles and also as I look at the code run by commercial applications (more on this later) that ad hoc queries are used quite a bit. This is all despite the numerous warnings everywhere regarding SQL injection attacks as well as the performance benefits of using stored procedures. For the purpose of this article I am defining (rather loosely) ad hoc queries as SQL commands built as strings by an SQL client application and submitted to SQL server. This in essence is rather encompassing and includes pretty much everything but stored procedure commands. Poorly written stored procedures can be a drain on CPU and memory for reasons other than poorly tuned SQL statements, but those reasons are beyond the scope of this article. I intend to focus on how failing to use parameters in SQL statements can be a huge drain on resources and can cripple even the most powerful hardware. For clarification, the context of this article is that of a large number of users executing similar, single statements concurrently as opposed to a single batch of statments within a single session (connection). SQL Server InternalsLike any programming language, every query submitted to SQL Server must be parsed and complied before it can be run. From a high level, this involves checking syntax as well as verifying the existence of objects used in the query. Once this is completed SQL Server now has a structure which it understands and can use to retrieve the data requested. For efficiency, SQL Server stores this structure in memory (Procedure Cache) for reuse. Now, the next time this query gets submitted, SQL Server can skip the parse and compile steps. We can see this in action in the following code sample: SET STATISTICS TIME ON
SET STATISTICS IO OFF
USE AdventureWorks
go
DBCC FREEPROCCACHE
go
SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Miller'
WARNING - The use of the command If you view the "Messages" pane in SQL Server Management Studio (SSMS) you will find the following: SQL Server parse and compile time:
CPU time = 78 ms, elapsed time = 106 ms.
(1 row(s) affected)
Among other things, just before the rows affected message, you will see the parse and compile time. These times will vary and this is obviously not a complex query so don't expect to see anything significant. I am running SQL Server 2005 x64 on Windows XP x64 with an AMD Athlon 64 x2 4200+ with 2GB of RAM. If you run only the SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 9 ms.
(1 row(s) affected)
The reason for the drop in CPU and elapsed time is due to the fact that the second time the statement was run (without clearing the procedure cache buffer) the statement was retrieved from cache. To see this in action create the following trace: /****************************************************/
/* Created by: SQL Server Profiler 2005 */
/* Date: 12/30/2006 10:01:02 AM */
/****************************************************/
-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 5
-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc
-- extension will be appended to the filename automatically. If you are
-- writing from remote server to local drive, please use UNC path and make
-- sure server has write access to your network share
exec @rc = sp_trace_create @TraceID output,0,N'C:\ProcedureCache',
@maxfilesize,NULL
if (@rc != 0) goto error
-- Client side File and Table cannot be scripted
-- Set the events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 38, 1, @on
exec sp_trace_setevent @TraceID, 38, 21, @on
exec sp_trace_setevent @TraceID, 38, 10, @on
exec sp_trace_setevent @TraceID, 38, 35, @on
exec sp_trace_setevent @TraceID, 38, 12, @on
exec sp_trace_setevent @TraceID, 35, 1, @on
exec sp_trace_setevent @TraceID, 35, 10, @on
exec sp_trace_setevent @TraceID, 35, 35, @on
exec sp_trace_setevent @TraceID, 35, 12, @on
-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint
exec sp_trace_setfilter @TraceID, 35, 1, 6, N'AdventureWorks'
-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1
-- display trace id for future references
select TraceID=@TraceID
goto finish
error:
select ErrorCode=@rc
finish:
go
When you run the script, make sure to copy the TraceID result so you can stop the trace when you're done by using the following command: --stop trace job
exec sp_trace_setstatus {PUT TRACEID HERE}
, 0
After creating the trace, follow the first two steps again. Then open your trace file (C:\ProcedureCache.trc if you are using the defaults from this script) and you will see several trace events. Among them, you will see an Ad Hoc SQLAd Hoc SQL is very flexible and often more familiar to developers than writing stored procedures because it allows developers to simply construct a string (usually via contatenation) and submit it to the database. Ad hoc can make writing Query By Form (QBF) processes very simple as user input directs which tables, joins and parameters are needed to satisfy the request. The resulting SQL statement can now look like this: SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Smith'
If you run this statement again in SSMS and look at the parse and compile time you'll see the statement was recompiled. Also, if you're still running a trace, you'll see the statement was added to the procedure cache instead of being reused. This shows that unless the query (including the values of the where clause) is exactly the same, SQL server must recompile the statement. On a small database, with little traffic this will mostly go unnoticed. But on a either a larger system or one with a significant amount of traffic your procedure cache will bloat. A side effect of this is that the data in your buffer cache will be pushed out resulting in more data being read directly from disk instead of from memory. This will cause a serious IO bottleneck and most likely cripple your system. Several years ago we purchased an ASP.Net application for ad management. The database is less than 20mb in size, but because of the traffic (100 - 200 batches/sec) it was able to bring a 3GHz Xeon with 2gb of RAM to it's knees on a server without any other databases installed. When we investigated the issue we found that SQL server was using all 2gb of memory and the CPU was stuck at 100%. Below is an example of how our query statement above might look from a .Net client: string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
"Application Name=AdHocClient;Server=(local);" +
"Database=AdventureWorks;Trusted_Connection=Yes;"))
{
StringBuilder sql = new StringBuilder();
sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress,");
sql.Append("C.Phone, E.*");
sql.Append(" FROM Person.Contact C");
sql.Append(" INNER JOIN HumanResources.Employee E ");
sql.Append(" ON E.ContactID = C.ContactID");
sql.AppendFormat(" WHERE C.LastName = '{0}'", eName);
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql.ToString();
cmd.CommandType = CommandType.Text;
try
{
using(SqlDataAdapter da = new SqlDataAdapter(cmd))
{
da.Fill(data);
}
}
catch(SqlException ex)
{
//.... execption code .....
}
}
The variable Parameterized SQLStored procedures are the recommended method for database access from a client application - for security and performance reasons. But if you have already written your application and don't have the time to move all your data access to stored procedures the alternative is to parameterize your SQL statements. When you define parameters your statements are the same, no matter what the value being passed. So if we were to take the above process and rewrite it as follows: string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
"Application Name=ParameterizedClient;Server=(local);" +
"Database=AdventureWorks;Trusted_Connection=Yes;"))
{
StringBuilder sql = new StringBuilder();
sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, ");
sql.Append("C.Phone, E.*");
sql.Append(" FROM Person.Contact C");
sql.Append(" INNER JOIN HumanResources.Employee E ");
sql.Append(" ON E.ContactID = C.ContactID");
//replace contatenation of variable with parameter name
sql.Append(" WHERE C.LastName = @LastName");
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql.ToString();
cmd.CommandType = CommandType.Text;
//Define SqlParameter object and assign value to the parameter
cmd.Parameters.Add("@LastName", SqlDbType.VarChar, 50);
cmd.Parameters["@LastName"].Value = eName;
try
{
using(SqlDataAdapter da = new SqlDataAdapter(cmd))
{
da.Fill(data);
}
}
catch(SqlException ex)
{
//.... execption code .....
}
}
Now when your client application runs this process SQL server will always use the compiled version in the procedure cache which prevents the cache from bloating and allowing more of your system memory to be used for the Buffer cache which will also improve your system performance and application scaleablility. SqlCommand.Prepare()A member asked weither it would be more efficient to use the declare @p1 int
set @p1=10001
exec sp_prepexec @p1 output,N'SELECT C.Title, C.FirstName, C.LastName,
C.EmailAddress, C.Phone,E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = @LastName
',@EmployeeID=15073
select @p1
Notice how it declares exec sp_execute 10001,@EmployeeID=12991
The reason it is not more efficient to use the The Sample ProjectThe sample project included with this article runs a stored procedure taken from the AdventureWorks database and passes the entire text of the stored procedure to a command object as text to simulate a more complex query with a single parameter. There are two versions of the query, an Ad Hoc version and a parameterized version. Each one is run 10,000 times with a randomized integer as the parameter. I recommend running perfmon while the sample project is running. I used the counters
As you can see, without parameters the process takes 10 times longer to run, the CPU runs almost 2 times higher and the number of 8k pages used by the procedure cache is 200 times more. And the last counter (8k pages) will continue to get worse the longer the application runs because when the query is parameterized it will only add one version of the plan to the cache. Conversely, the ad hoc version will add a new version of the plan for every unique variable passed to it. ConclusionWhile stored procedures are the recommended method for data access clients, it isn't always practical to rewrite everything if your code is already using Ad Hoc SQL. However, you can get almost the same performance benefit from using parameters that you can get from stored procedures. So if you haven't done so there really shouldn't be any excuse for not parameterizing your queries. Your application will scale better and will be more stable. Revision History1/10/2007
| ||||||||||||||||||||||||||||||||||||||