Performance Strategies for Enterprise Web Site Development

jconwell

4.87/5 (86 votes)

Feb 2, 2004

59 min read

216651

This article describes performance strategies to use with web sites that need to be able to handle a high volume of users. This includes planning for perf tuning, tools and metrics to use in measuring performance, as well as many techniques that you can use to actually make your pages run faster.

Introduction

ASP.NET and the .NET Framework has given web developers a much larger toolbox to work with then they had before with straight ASP. But as with any new language or technology, picking the correct tool for the job is sometimes more difficult than actually using the tools themselves. For example, you probably wouldn’t use a chainsaw to cut a copper pipe, and conversely you wouldn’t use a hacksaw to cut down a tree. But if you don’t know which tool to use, or how to figure out which tool is best suited for the job at hand, then you could make mistakes that can cause your website to perform less than optimal.

With .NET, you have about 30 different ways to do just about anything. But the trick is to figure out which tool to use for what you are trying to accomplish. For most websites, it’s not a big deal if your code isn’t as efficient as possible. For informational sites or small E-Commerce web sites, it's forgivable if a page takes 5 to 6 seconds to load, but for an enterprise sized E-Commerce web site that has to handle a high volume of users, page performance can be the difference between the site still being open next year or not. For sites like these, where performance is vital to their survival, it is very important to take performance into consideration during every phase of the development cycle.

A web site can offer hundreds of categories, thousands of products, have the coolest graphics and use the latest technology, but all that is pretty much worthless if the pages don’t load fast. Most likely, there are several other web sites available that do exactly what yours does and the user will go elsewhere if their browsing experience isn’t snappy enough.

With the plethora of tools at a web developers finger tips these days, it is very important that we know the pros and cons of each tool, as well has how to quantify which tool is best for each situation. After ASP.NET was released to the public, 1001 books were published showing how quick and easy web development was in this new .NET era. “Reduce time to market, decrease development costs, and increase scalability and maintainability with ASP.NET”. This was the marketing rally that was being broadcast from Redmond. But anytime you hear marketing verbiage like this, take it for what it is…marketing. Take it upon yourself to investigate, learn, profile, and quantify what is being hyped before you implement it into any piece of critical functionality.

This article is a culmination of ASP.NET performance best practices, that myself and the developers I have worked with, have come up with while developing enterprise scale web sites in ASP.NET. The title of the article also says the words ‘performance strategies’. I will be going over strategies that can be used during performance tuning that will make the task more organized and meaningful. I also want to point out that this article demonstrates very little code, since I think (and hope) the discussion is clear enough for you to use to implement your own performance related strategies.

When do you do performance tuning?

So at what point in a project’s life cycle should the development team focus on code performance? The answer: Always, during every phase of the project lifecycle. Some development teams save performance tuning until the end of each release, or the entire project (if at all). They add sort of a ‘performance tuning phase’ into their development lifecycle. But in my opinion, I think this is a big mistake. If your team goes through their design and development phases without considering code performance, you will most likely find that several of your pages and algorithms are too slow to be released into production. The end result being you’ll have to redesign and recode them. This is a huge waste of time and will most likely push your release to production date. To help avoid this pitfall, performance should be a serious consideration throughout the entire project.

Now, I do believe that having a performance tuning phase at the end of each release is not a bad idea. But coding the functionality for the upcoming release, without taking into consideration the performance of the code, just because you have an official performance phase, can lead you into serious trouble.

A good time to implement a performance phase is to run it in parallel with the test phase. Pull a few developers off from the development team who are working on the bugs the test team finds, and have them totally focus on running performance analysis tools and tuning the code accordingly. An even better arrangement would be to have your test teams familiar with performance analysis tools and techniques, being able to find bottlenecks and inefficient memory usage themselves, logging these things as bugs and then letting the development team fix the problems. But this sort of arrangement isn’t very realistic in most companies.

One technique that can be used to help reduce the amount of time wasted redeveloping slow code is prototyping. Early in the project’s lifecycle, during the analysis, design and early development phases, you should create prototypes of critical pieces of functionality. And not just one prototype, but several different versions of the same functionality should be written. This way you can profile each one to see which is more efficient. The biggest mistake developers make, especially when using a new technology, is to learn just one way to code a piece of functionality and call it ‘good enough’. You should dig deep into the code, use timers and profilers, and find out which techniques are the most efficient. This strategy will take more time at first; extending your timeline during your first few projects, but eventually you will build your own toolbox of performance best practices to use in future projects.

There is one thing I want to bring up at this time that you should keep in mind. Performance shouldn’t overshadow any of the key ‘abilities’ in your projects. Extensibility, maintainability, usability, and complexity are all very important factors to keep in mind when designing and developing your site. It’s pretty easy (take it from me) to get caught up in performance profiling, tuning, and tweaking your code. You can always make your code go just a little bit faster, but there is a point that you have to call it good enough. This is the performance tuning version of ‘Analysis Paralysis’.

You should try to keep a good balance between all of the ‘abilities’. Sometimes creating a wicked fast page will come at the expense of maintainability and complexity. In these situations, you’ll have to weigh the benefit of the page speed verses the effort and time it will take to maintain and extend the page in the future. I always like using the ‘Hit by a bus’ analogy. If only one person on your team has the technical ability to code and maintain a piece of functionality, is that functionality really worth it? What will you do if that person is hit by a bus? What would you do then?

Performance isn’t just for developers: Design for Performance

While analysts and architects are flushing out the site's requirements and creating their design documentation (UML diagrams hopefully), they should also try to keep performance in consideration while they design the site. Don’t just try to design the required functionality, but try to design the functionality to be as efficient as possible. For example, caching is a good performance consideration that can, and should, be designed for, well before the developers start coding. For example, designing your site to have the process call the database every time a user requests a page is not very efficient. A data caching strategy can be put in place, in the design of the site, to reduce the number of database hits. ASP.NET has a great caching mechanism built into it that you can take advantage of. Microsoft also has a Caching Application Block that has even more functionality if you need more advanced caching capabilities (MSDN).

Hopefully the examples that I’ll show you in the later half of this article will give you some good ideas to use in the design of your next site.

Planning for performance tuning

Before you jump into your code performance tools and start tweaking your code, you need to plan out what you are going to do first. Without a good performance tuning plan and good procedures to follow, you could actually make your site run slower. First, divide the site up into strategic pieces. You should order each piece based on its relevance to making the site successful. Then performance tune each piece in this order. A smoking fast ‘Contact Us’ page does a site no good if the users are waiting 5 – 10 seconds to load a product page.

This is the order that I usually follow: Home page (a user won't use your site at all if you don’t load the home page fast), Search page and algorithm, Category and Product pages, checkout process, profile pages, and then customer service and ancillary pages. The reason for breaking up the site into phases like this is, you probably won’t have the luxury of spending time performance tuning the entire site. Deadlines will usually force you to pick and choose which pieces you should work on, and in that case you should work on the most critical pieces first.

If you have an existing site that you are either redesigning or upgrading, a good way to figure out which pages are the most critical to your site's success is to parse your web server log. Find a day that had a particularly high user volume and count the number of times each page was hit during that day. Most likely only about 10 or so pages will make up 90% of the user traffic on your site. This is a good place to start.

Now, before we really get into the nitty-gritty of performance tuning, there are a few more steps you should perform. First, you should identify what tools you will use to analyze the code and measure performance metrics. I’ll be going over a few tools in detail, a bit later.

Next, you need to identify what metrics you should be using to measure the site’s performance. Each tool comes with a myriad of metrics. You could research every metric available, but that would take forever. There are a few key metrics that are especially meaningful to site performance that you should be concerned with the most. I’ll be going over each of these as I discuss the different tools.

Once you know what tools you will be using (and know how to use them), what metrics you plan on using to measure site performance, you then need to take a base line measurement of your site. You run this baseline to figure out where the site stands, prior to applying any code optimizations. You’ll use this first baseline measurement to help you determine what numbers are acceptable for a production release and what kind of effort is needed in order to get there. The baseline will also help you figure out what parts of the site are fast enough, and what areas still need work.

The last thing you need to figure out before you fire up your profilers is the performance tuning methodology that you are going to follow as you go through the tuning process and make changes to your code. The methodology I usually follow has four basic steps. First, before you make any changes, record a metric baseline using the profiler or load the testing tool of your choice. Second, make ONE change. This doesn’t mean only change one line of code, but make only one fundamental change to your code, in one small area of your code (like one aspx page).

For example, if your code uses the StringBuilder to perform all string concatenation in your site, but you have developed a custom, optimized string concatenation class, and you want to see if it is faster than the StringBuilder. Take one aspx page and change out all of the StringBuilder classes for your new custom class and then run the tests. The point of one change is to make one fundamental change in one small area.

The reason you should only make one key change to your code is that you run the risk of contaminating your test scenario. What this means is this; let's say you make three different types of changes in a page. But when you run your metrics, you find that there is no difference in the metrics, or maybe the changes actually slow the page down. The problem is that two of the changes you made may have made the page run faster, but the third change made the page actually slower. Slow enough to negate the two beneficial changes.

The third step is to take another measurement of the page's metrics with the new changes, and evaluate the differences between the baseline and your new changes.

The forth step is to log the results. This includes a description of the change you made, the numbers from the baseline, the numbers from your changes, and the differences. It is very important that you log and keep track of your progress while you are in the performance tuning phase of your project. This log will give you and the project’s manager good idea as to where you are in your goal of getting the site ready for production. The performance log will also be very valuable for publishing a ‘lessons learned’ document at the end of your project, conveying performance best practices to other development teams.

To recap the steps you take when tuning your code: take a baseline measurement, make ONE fundamental change, take a new measurement, and log the results.

One final note, once a change has been identified as beneficial to the performance of a page, that change should be propagated out to the rest of the code in the project. But don’t just blindly make the changes. You should use the 4 steps approach for this too, just to make sure that the benefits you saw on one page is just as beneficial to the rest of the site.

Code Profilers, Site Load Testers, and Site Metrics Oh My...

There are dozens of tools that one can use to help with performance tuning. But most of them break down into two different types; profilers and load testers.

I’d like to talk about load testers first. You can spend up to several thousand dollars to purchase one of these tools, and there are many really good ones to choose from. The one I like the best (because I’m cheap and it’s free) is Microsoft’s Application Center Test (ACT), which comes with Visual Studio Enterprise edition. With ACT, you can record one page or a series of pages into a test script. When you run this script in ACT, it will call that script over and over again for a given period of time, and for a specific number of users. After the script has finished running, ACT will generate a report that shows the performance metrics for that script. Before running a test, you should figure out how many users you want your site to handle (which you can set in the properties of the test script). If your site is a highly used site, then think about starting out with 15 – 25 simultaneous users, and then as you tune your pages and they get faster, test with up to 50 – 75 users.

There are many reasons to test with a high number of users. First, you need to see what your processor is doing while your pages are being hit hard. If your test script is pegging the processor up to 90+ % with only 25 users, then you might have some efficiency issues with your code. Also, when running with a high number of users, you should monitor the % time the ASP.NET process is spending in garbage collection (how to get these counter results is explained below in the PerfMon section). A good guideline for % of time in garbage collection is to keep it less than 25% (which is pretty high in my opinion). Anything higher and the ASP.NET process is spending too much time cleaning up old objects and not enough time running the site. If the % time in garbage collection is high, you’ll need to take a look at how many objects you are creating, and try to figure out how your code can be rewritten to be more efficient (i.e. Not generating so many objects!).

A good example of this that I found (which I’ll talk about in greater detail later) is data binding to ASP.NET controls. We had a page that displayed a great deal of information, and the page was designed exclusively using data bound controls. The page ran ok with up to 10 users. But when we took the number of concurrent users up to 50, the processor pegged at 100% and the page became extremely unresponsive. What we found out was that the ASP.NET process was spending just over 30% of its time performing garbage collection!!!

In ACT, you can set how long you want the test to run, even up to several days if you wanted, or you can set it to run up to a specific number of iterations. I prefer to run for duration and see how many iterations of the test script I can get during that time period. Once a test run is finished, the results are stored and a report is generated. One of the nicest features that ACT has is the ability to compare two or more test result reports against each other. You should use this to compare your baseline, and several sets of new changes. This will give you an easy way to see what effect your code changes had on the performance of the pages. The following is a list of metrics that ACT gives in its summary report, that I like to monitor:

Test iterations – the number of times the script was able to run all the way through in the test’s time period.
Total number of requests – the number of requests the server was able to handle during the test’s time period. The more requests served, the better.
Average requests per second – the number of requests per second the server was able to give. The more requests served, the better. ACT’s summary report also shows a graph over time of the requests per second. This is a good way of seeing if there are any noticeable patterns in the request per second over time (for example; garbage collection kicking in).
Average time to first byte – the time between sending the request, and receiving the first part of the Web server's response stream.
Average time to last byte – the time between sending the request, and receiving the end of the Web server's response stream.
Number of bytes received – the number of bytes that your web server served up during the test script. This metric is very important, but can be misleading. When you are testing on your own server, your page could be very fast, but could also be passing a large amount of data back to the client. This won’t adversely affect your test because it’s all local, but if the user is on a 56K modem line, then great page performance will be overshadowed by the amount of time it takes to download half a meg of HTML. In order to use this metric correctly, you should only consider it if your test script is setup to run for a specific number if iterations, NOT if you are running for a specific duration. If you run a test two times (on two different code changes) for 20 min and one is much faster than the other, then you’ll see a much higher number of bytes received. But is it because one test ran for more iterations, or because it really is sending more data? The only real effective way to really use this metric is to run the test for a set number of iterations.

One important note to mention. Sometimes you may want to use ACT to run scripts on two different servers. For instance, you may want to test your latest code changes against what is on your test server with your latest test build. This is totally acceptable and commonly done. But this kind of test is only valid if the test server and your server have the same hardware specifications. Also, if your development server has the site code, database and web services on it, but your test environment is setup as a distributed architecture, then this will also negate the test results. Though this may seem like common sense, I do think it's worth bringing up, so somebody doesn’t spend a week trying to get their P3 1.5 GigHz dev box to perform as well as their P4 3.2 GigHz test server.

Another tool that I really like (because again, I’m cheap and it's free) is the Performance Monitor tool that comes with Windows NT family of operating systems (NT, 2000, XP Pro and 2003 Server). PerfMon is a tool that records and shows a graph of desired performance counters, while your code is running. When .NET gets installed on your development box, PerfMon adds hundreds of counters to PerfMon specific to .NET. Not only that, but the .NET Framework gives you classes that you can use to write your own performance counters to monitor in PerfMon!

To see some of the counters that gets installed with .NET, open up your PerfMon by clicking Start | Programs | Administrative Tools | Performance. Once the application opens, right click on the graph window and click ‘Add Counters…’. In the Add Counters dialog box, click the ‘performance object’ combo box, and you will see about a 60 or so different categories of performance counters. Select one of these categories, for instance the ‘.NET CLR Memory’ category and you will then get a list of individual counters for this category. Click the 'Add' button to add these counters to PerfMon, so it'll start monitoring them. I’ll leave it up to you to read the provided help files and MSDN topics on the different counters, and how to use PerfMon in general. One tip for learning what each counter means; in the 'Add Counters' dialog box, click the button called 'Explain'. This will extend the dialog box and give you an explanation of each performance counter that you click on. That should be your first step in figuring out what PerfMon has to offer. Another cool thing is that ACT can also record PerfMon counters during its script runs and show you the counter results in its own summary report. I find this is a good way to record the % time in garbage collection counter in ACT, as well as processor counters while your tests are running.

There are so many different counters available in PerfMon that it can be a bit overwhelming at first. The following table is a list of counters that I normally like to watch while I’m testing a piece of code.

Category	Counter	Description
.NET CLR Exceptions	# of Exceptions Thrown	The total number of exceptions thrown since the start of the app.
.NET CLR Memory	# Gen 0 Collections	The number of times Gen 0 objects have been collected since the start of the app
.NET CLR Memory	# Gen 1 Collections	The number of times Gen 1 objects have been collected since the start of the app
.NET CLR Memory	# Gen 2 Collections	The number of times Gen 2 objects have been collected since the start of the app. Gen 2 collections are detrimental to code performance.
.NET CLR Memory	% Time In GC	This is the % of process time that the ASP.NET process is spending in garbage collection.
.NET CLR Memory	Promoted Memory from Gen 1	The bytes of memory that survive garbage collection (GC) and are promoted from generation 1 to generation 2. If you have a lot of objects that survive to Gen 2, you need to look at what is holding onto your object references. Gen 2 collections are very costly.
Memory	Available MBytes	Megabytes available on the server. Use this to see if your application is using a lot of memory or if it has a memory leak (yes, it’s still possible).
Process	% Processor Time	The % of elapsed time that all of the threads of this process used the processor to execute instructions
Processor	% Processor Time	The % of time that the processor is executing a non-idle thread

This isn’t an exclusive list of counters that I’ve used, but the base template that I use when using PerfMon. If you are using Remoting and/or web services, then there are categories in PerfMon for both. Also, there are very important IIS counters that PerfMon exposes, but I prefer to use ACT to gather IIS statistics.

PerfMon Counter Specifics

I would like to say a few things about some of the counters I just mentioned. % Processor Time is one of the most important counters for you to monitor. If you are running a load testing tool with only 5 – 10 users and the % Processor Time is up around 80 – 90%, then you’ll need to reevaluate your code because it is working too hard. But if you are running 75+ users and the processor is up to 90%, than that’s to be expected.

"Available MBytes" is what I use to judge the memory efficiency of my sites. I haven’t found really a good way to find out exactly how much memory the ASP.NET process is taking up, but "Available MBytes" will give you a decent idea. Get PerfMon running and add this counter before you start running your load tester. Record how many megabytes of memory you have available, then start your test. While your test is running, watch "Available MBytes" as it goes down (if it goes up, then you just found the algorithm of the century!). At the end of your test, find the difference between how much memory you started with and how much was available just before the test ended. This is an important counter to measure if you have implemented some sort of caching strategy in your site, as your site could use too much memory, which will cause the ASP.NET process to recycle.

"Available MBytes" is also good to see if your code has a memory leak, especially if your code is calling legacy COM components. Create a load test that runs for 20 – 30 minutes and see if the "Available MBytes" counter finds some kind of constant value or if it keeps decreasing. If it keeps decreasing over 30 minutes or so, then there is a good chance you may have a memory leak somewhere. And yes, it is still possible to create a memory leak, even in .NET. It’s just much harder.

"# Gen 2 Collections" is another metric that should be watched. Generation 2 collections are fairly expensive and if your site is running a high number of them, then you need to take a look at your code and see if it is holding onto child object references longer than it should for some reason. This is a hard one to quantify though, since there isn’t any recommendation for a Gen 2 to Gen 1 and Gen 0 ratios.

"# Exceptions Thrown" is also very important to monitor. It can tell you if your code is eating a high number of exceptions, which is also very costly. This can happen if your code uses try / catch blocks in order to direct process flow instead of just catching errors. This is a bad practice that I often see and I’ll discuss a little bit about it later.

Code Profilers

There are many tools out there that you can use to help you analyze your code and your code’s performance to see where your trouble spots are. Code profilers hook into your assemblies by using the .NET Profiler API, which allows the profilers to run as part of the process being monitored, and receive notifications when certain events occur. There are several good profilers out there. I like to use the ANTs code profiler by RedGate. ANTs is an inexpensive and relatively simple profiler, that is limited to recording function and single line execution times. When an aspx page is run, it can record metrics for the entire call stack, so you can dig down deep into your calls and see how efficient the .NET Framework classes are. ANTs gives you several different metrics for function and line profiling, including; max time called, min time called, average time called, and number of times called. This last one is important if you have your site broken into several layers. You can execute a profile on one page and see if the UI layer is making efficient use of your business and data layers, by seeing if any functions are called a high number of times. This is an easy way to trim off some execution time from your page.

Another profiler that I’ve used is AQTime from AutomatedQA. This profiler has many more metrics, including performance profiler, memory and resource profiler, and an exception profiler. I found that it works really well with .NET exe assemblies, but I couldn’t get it to work with even the simplest web site.

There are several other good profilers available, each of which should have a free trial download for you to play with. Whichever tool you do decide to go with, you should become extremely efficient with using it. Two skills that every developer should be experts at is debugging and using a code profiler.

Another profiler that you should become familiar with, if you are using SQL Server as your database, is the SQL Server Profiler. This is a great tool for monitoring all stored procedure and inline SQL calls being made to SQL Server. You should use this profiler to see if stored procedures are being called more times than they need to be, as well as how long they are taking to execute. This is also a good way to look for data that could be cached on your IIS server. If you see the exact same SQL or stored procedure being executed on different pages, think about implementing one of the caching strategies discussed later.

One tool that I especially like to use is a .NET Test Harness Framework that was written by Nick Wienholt. This is a great tool if you want to compare two different prototypes of a specific piece of functionality. For example, checking for an empty string. You write one test scenario that compares a string to the String.Empty constant, and another scenario that checks for String.Length equal to zero. What makes this framework handy is that, once you write your different test scenarios, you register the test functions with the test harness' delegate and tell it how many times to call the test functions. It will execute each test scenario the specified number of times, using the QueryPerformanceFrequency Win32 API call, to accurately measure how long each test function took to execute. When the test harness is finished, it calculates the min, max and average execution times for each test function. I used this tool to run many of the scenarios that I talk about in the last half of this article.

When using the test harness, be sure to run your official tests under a Release build. The C# compiler will use some IL code optimizations when compiling for Release that it won’t use when compiling for Debug. Also, if you are testing something that uses hard coded strings, be aware that the C# compiler will inline string constants and string concatenations into your code at compile time, which could cause your test results to be off. The best way to avoid this is to pass these strings into your test harness as command arguments. This way the compiler won't be able to perform any string optimizations.

You can find a document discussing how to use the test harness framework here.

You can find the code for the test harness framework here.

One last tool I want to mention, that I have found indispensable while performance tuning is Anakrino. This tool is an IL disassembler that can take any .NET assembly and disassemble the IL byte code into either managed C++ or C# code. This can be invaluable when a third party or .NET Framework class seems to be causing a problem (bug or performance bottleneck). All you have to do is open the offending class in Anakrina and look at the code for yourself to see where the problem lies.

Stuff to make your site go faster

This section is the meat and potatoes of this article. No more preparation and planning, this is where I share the performance specifics that I’ve learned while developing ASP.NET web sites.

While the easiest way to make your site run faster is to beef up your server farm’s hardware, the cost of doing this can be pretty high. But an easy, cost effective way is to squeeze out a few more requests per second by writing good, efficient code.

For the rest of this article, I’m going to make up and talk about a fictional web site that sells Apples. It sells many different kinds of apples and because people all over the world use this site to purchase their apples, it has to perform fast or the users will just drive to their local Safeway and get their apples there. This site is broken up into three different assemblies; a user interface (UI) assembly that has all the aspx pages in it, a business layer assembly that handles business object creation and business logic execution, and a data access layer.

Object Caching

One of the easiest things to implement that can give you a decent performance boost is to implement a caching strategy using the HttpRuntime.Cache class. This class is basically a wrapper around a hashtable that is hosted by the ASP.NET process. This hashtable is thread safe and can be safely accessed by several HTTP requests at the same time. I’m not going to go through the specific APIs for the Cache class, MSDN does a pretty good job of this, but basically you can store objects in the hashtable and later pull the objects out and use them. The objects in the hashtable can be accessed by any request, as long as they belong to the same application. Each AppDomain has its own hashtable in memory, so if your IIS server hosts several sites, each site would only be able to access the objects that they put in the cache specifically. One important thing to remember about the HttpRuntime.Cache class is that any object that you put into the Cache class can be accessed by any user in any request. So objects that are either user specific or are often updated after being created and populated are not good candidates for being stored in the cache. Objects that, once created, are fairly read only are good candidates for this type of caching strategy. Objects of this type tend to be used for holding output data only, such as a product or category object.

So how would we use the Cache class? Let’s say that every time a user wanted to see the product page for Granny Smith apples, we could call the database, create an instance of the Product class with Granny Smith data in it, and then return this object to the UI layer. And each time someone wanted to see Granny Smith apples, the site would go through these steps. But that’s a lot more database calls than are necessary. We need to add two more steps into the process. When the UI calls into the business layer for a Granny Smith product, the code first checks to see if the cache contains it. If the cache does, the code just returns it to the UI. But if it doesn’t, the code goes ahead and calls the database and creates a new Product instance. But before we return the apple instance to the UI, the code inserts it into the Cache class. So the next time someone requests the Granny Smith page, the code won’t have to make a call to the database, because the object is being held in the cache.

But there is a flaw in that implementation. What if there is a sudden shortage of Granny Smiths and you need to triple its price? (Remember your micro-economics?) You can make the change to the database, but we still need a way to propagate any changes to the Product table to your site. One of the cool features of the Cache class is that you can set an expiration timeout for any object you put into the cache, as well as an expiration callback delegate. When your code inserts the Product instance into the cache, you can specify how long you want it to stay in cache. You can also specify a function that should be called when your object expires out of the cache.

So let’s say you set a 10 minute expiration on every Product instance you put into cache, and you also specify the expiration callback delegate. 10 minutes after the Product instance was inserted into cache, the ASP.NET process kicks it out and the callback delegate is called. One of the parameters of the callback delegate is the actual object that was being held in cache. If you stored the apple’s database identifier in the Apple class, you can use it to query the database for the updated Apple data. Then you can either create a new Product instance with the returned data or just update the existing object’s properties. But which ever you do, be sure to put the updated instance back into cache (with an expiration and callback delegate of course). This strategy will give you perpetually updating product classes in memory, which will dramatically decrease the load on your database servers as well as the time it takes to make all those extra database calls.

Like I said before, I’m not going to show a lot of code in this article, and I think the above explanation does well for itself without coding it out for you. I am going to give you a few design patterns that I think will be helpful in implementing this. First, you should create a custom cache class that encapsulates all calls to the HttpRuntime.Cache. This will keep all caching logic in one place and if you ever implement a different caching architecture, you only have to rewrite one class. A second pattern is the use of object factories. These are helper classes, a one stop shop if you will, that you call in order to get object instances. For example, a ProductFactory class might have a CreateProduct method that takes the product ID and returns a product instance. The CreateProduct function itself handles all the calls to the database layer and to your custom cache class. This way you don’t have object creation logic spread all over your site. Also, if your site uses straight DataSets, DataTables and or DataRows in the aspx pages, instead of creating Product classes, that’s ok. The Cache class works just as well with these objects.

One warning about the caching framework I just described. If you put an extremely large number of objects into the cache and you are using the callback delegate to reload them, you may actually hurt your performance rather than help it. One site I was working on had thousands of categories and tens of thousands of products. When all of these objects were loaded into the cache, the processor pegged at 100% every 10 minutes when the cache was reloading itself. Because of this, we ended up using a caching strategy where all objects expired every 10 minutes, and then would just fall out of scope. So if a user requested the same product within the 10 minute time period they would be saved a call to the database, otherwise the site would have to create a new instance, put it in the cache, and then return it to the UI layer.

Another problem with the self reloading cache strategy is if a user requested a fairly obscure product, and no one else requests that same product for a week. Your web server will be reloading that product ever 10 minutes even though no one ever requested it. The self perpetuating caching strategy is best for high volume sites, but with a smaller product base. Exactly how many objects becomes to many depends on your server’s memory resources and its processor. But PerfMon is a great way of measuring if the server can handle this strategy.

A Data Dependency Driven Caching Architecture

When inserting an object instance into cache, there is another option that you can use to create a more efficient self-perpetuating cached strategy. There is a class you can use called CacheDependency. When you insert an object into the HttpRuntime.Cache class, you can also pass in an instance of the CacheDependency class. What this class does is act as a trigger to kick your object out of cache. When the CacheDependency’s trigger is fired, it will tell the Cache class to kick the object it is associated with out of cache. There are two types of cache dependencies; a file dependency and cached item dependency. The file dependency version works by using an internal class called FileChangesMonitor. This class monitors whatever file you specify and when the file changes, it will invoke its FileChangeEventHandler, which the HttpRuntime.Cache class has registered a callback function to. This callback will trigger the HttpRuntime.Cache class to kick out whatever object has been associated with the CacheDependency instance.

So how can we use this to create a more efficient self-perpetuating caching strategy? We accomplish this by putting SQL Server triggers on the tables that hold the data that the objects hold. Let’s use our Product class again as an example. And let’s also say that our database has 250 different types of apples in it, each with its own ProductID. We create a trigger on the Product table, so that every time a row in the Product table is changed, it creates a file somewhere on the network and sets the text of the file to “0”. If the file already exists, the trigger would just update the file. The key to this whole strategy is the file name and the text in the file. The text is just one character, and every time the trigger updates the file, you just change the character from “0” to “1” or “1” to “0”. The file name is the ProductID of the product that just got changed.

So when inserting an apple Product object into cache, create a CacheDependency instance that references the file whose file name is the same as the ProductID of the object being put into cache. In this case, you do not pass an expiration time into the Cache.Insert() function like before, but you do still specify the callback delegate. Once the apple Product is inserted into cache, it will stay there until the data in the database is changed. When this happens, the database trigger will fire and update the file. This causes the CacheDependency to trigger, and your Product instance gets kicked out of cache. Your callback function then calls the database, recreates the Product instance with the new data, and then inserts it back into the cache. This will dramatically cut down on the number of database calls and your objects will only refresh when they have to, which will dramatically cut down on the load of your web server’s processor load.

Request Level Caching

Another hashtable that you can take advantage of for a caching strategy is the HttpContext.Current.Items class. This class was originally designed for sharing data between an IHttpModules and an IHttpHandlers during an HTTP request, but there is nothing stopping you from using it within an aspx page or any assembly within the call stack that your aspx page starts. The scope of this hashtable is the duration of a single HTTP request, at which point it will fall out of scope as well as any object it references.

This hashtable is the perfect place to store objects that have a short lifespan, but are accessed multiple times during the span of a single Request. A good example of this might be connection strings that your site stores in a file or in the registry. Let’s say your data access assembly has to read its connection strings from the registry and decrypt them each time it calls the database. If the data access assembly is called several times during a single request, you could end up reading the registry and decrypting the connection string more often than necessary. One way around this repetition is to put the decrypted connection string into the HttpContext.Current.Items hashtable the first time it’s needed during each request, and then every other time during your data access assembly is called during that request, it can just pull the connection string from the HttpContext.Current.Items class. Now I know what you may be thinking. Why not just store off the connection strings in a static field and hold it for the life time of the application? The reason I would stay away from this is that if you ever needed to change your connection strings in the registry, you would have to restart your web application in order to get them reloaded into your static fields. But if they are only stored for the life of each request, then you can change the registry without fear of bumping people off your site.

Now connection strings might not be a very good example, but there are many types of objects that can take profitable advantage of this caching strategy. For instance, if you are using Commerce Server, then this is a really good place to store objects such as CatalogContexts or ProductCatalog objects, which can get created many times during a single HTTP request. This will give you a great performance gain in your site.

Page and Control Caching

Page and control caching is an obvious tool and is very well documented in MSDN, as well as many ASP.NET books, so I wont go over them very much except to say use them if possible. These techniques can dramatically increase your requests per second metric.

View State Management

When ASP.NET first came out, and I heard of view state and what it did, I was overjoyed! Then I actually saw just how much text there was in the View State and my joy disappeared. The problem with view state is that it not only stores the data on the page, but it also stores the color of the text, the font of the text and height of the text, the width, the…well, you get the picture. One of the pages we developed was so large that the view state was 2 meg! Imagine trying to load that page in a 56K modem.

Now, I’m not smart enough to suggest to Microsoft how to create a better view state, but I do know that on some pages you just can’t use it. It’s just too big and it takes too long to download. And granted, with view state on, you don’t have to reload your entire page on a post back, but if you have an efficient HTML rendering strategy (which I’ll talk about last) then I think the time cost of re-rendering the page will offset the time it takes to download the view state to the client (Not to mention the time it takes to decode and encode the view state for each request).

If you do decide to use view state, take a look in your machine.config file, in the <pages> element and make sure that the enableViewStateMAC attribute is set to “False”. This attribute tells ASP.NET if it should encrypt the view state or not. I don’t like to put anything confidential in the view state, and hence don’t care if it’s encrypted or not. Setting this value to false, which will disable the encryption, will also save you a little bit of time per page, especially if your page is carrying a large amount of view state.

One quick note that might save you a few days of troubleshooting: If you do set enableViewStateMAC to true and you are operating in a server farm that is load balanced, you will also need to make sure that each server has the same encryption / decryption key. The encryption algorithm that ASP.NET uses to encrypt view state is based on a machine key. This means that view state encrypted on server ‘A’ wont be able to be decrypted if Server ‘B’ handles the return request, and you’ll get a view state exception. To fix this problem, set the validationKey attribute of the <machineKey> element in the machine.config file to a hexadecimal string from 40 to128 characters long (128 characters is recommended). You’ll need to set this on every web server with the same key. This will enable one server’s view state to be decrypted and used on another server.

I have two other config file observations that do not really fit anywhere else and since I just talked about the machine.config, this is as good a spot as any. There is an attribute in the machine.config file that lets you allocate the % of memory that your web server has for use by the ASP.NET process. This is the memoryLimit attribute of the <processModel> element. Its default value is set to 60%, but I’ve found that I can get away with 80% without any adverse affects on the web server and if you are implementing some sort of caching strategy, as discussed above, and your product list is fairly extensive, this is something you may want to consider changing.

The second item is the <compilation> element in the web.config file. This element has an attribute called debug. When moving your code over to production, you should make sure to change this to ‘false’. This attribute does not affect the C# or VB.NET compilation of your code into an assembly. If you compile your assembly with this setting set to true, then again with it set to false, and then compare the IL code of the two assemblies using ILDASM, you won’t see any difference between the two compilations. But this attribute will affect the JIT compiler and cause your site to run a little faster if set to false. My guess is that the JIT compiler uses this flag for compilation optimizations.

String Manipulation

One of the classes that Microsoft released with the .Net Framework was the StringBuilder class in the System.Text namespace. This class is a high performance way to concatenate text together to build large text blocks. String concatenation in the old ASP days was extensively used to help build HTML output, but it really hurt page performance. So now that we have the StringBuilder, you might think that it is the best choice for string concatenation, right? Well, yes and no. It depends on the situation and how you use it.

The general rule for string concatenation is use the StringBuilder if you are going to concatenate 5 or more strings together. For 2 – 4 strings, you should use the static String.Concat function. The String.Concat function takes two or more strings and returns a new string that is a concatenation of all the ones passed in. You can also pass in any other data type into the Concat function, but it will just call ToString() on these types and do straight ‘+’ style string concatenation on them, so I’d avoid using it for anything but strings.

If you take a look at the String.Concat method in Anakrino (in the mscorlib file), you’ll see that if you pass 2 or more strings into the function, the function first adds up the total number of characters for all the strings passed in. It then calls an extern FastAllocateString function, passing in the size of the new string it is going to build, which I assume allocates a block of memory big enough to hold the entire return string. Then for each string passed in, Concat calls another extern function called FillStringChecked. This function takes a pointer to the block of memory that was allocated in FastAllocateString, the beginning place in memory for the string being added, the ending point in memory, and the string to add. It does this for each passed in string to build the newly concatenated string, which it then returns. This is a very fast way to concatenate 2 – 4 strings together, and from my tests with the .NET Test Harness, the String.Concat function outperforms the StringBuilder by 130%.

This sounds straight forward enough, right? For 2 – 4 strings, use String.Concat, and for 5+ strings, use StringBuilder. Well almost. I got an idea in my head and thought I’d profile it in the .NET Test Harness, as well as PerfMon, to see what would happen. Since String.Concat can take up to 4 strings, why not use one String.Concat to concatenate the outputs of 4 inner String.Concat calls? So I setup the test and found that for up to 16 strings, nesting String.Concat functions inside an outer String.Concat function, it would outperform the StringBuilder class by 180% (1.8 times faster). The other thing I found with this method was that PerfMon counter, % in Garbage Collector, was much lower when using nested String.Concat functions (but in order to see this, you’ll have to perform the test thousands of times in a row). This is great news if your site does a large amount of string concatenation. The only problem with nesting String.Concat functions is that it makes the code fairly hard to read. But if you really need to boost your page performance, then you might want to consider it.

If you find that you need to use the StringBuilder to build large blocks of text, there are a few things you can do to make it work a little faster. The way the StringBuilder works, if you use the default constructor, is that it has an initial capacity of 16 characters. If at any time, you add more text than it can hold, it will take its character capacity and doubles it. So by default, it will grow from 16 characters, to 32, to 64, to…you get the picture. But if you have a good idea as to how big the string you are trying to build will be, one of the StringBuilder constructors takes an Int32 value, which will initialize its character capacity. This can give you better performance, because if you initialize the StringBuilder to 1000, it won’t have to allocate more memory until you pass in 1001 characters. The performance gains for pre-initializing the StringBuilder isn’t all that much, about 10% – 15% depending on the size of the finished string, but its worth playing around with, if you have a good idea how many characters you are going to add.

There is one more item I’d like to address when talking about strings, and that is checking for an empty string. There are two basic ways to check if a string is empty or not. You can compare the string to the String.Empty static property, which is just a constant for “”:

if (firstName == String.Empty)

Or you can check the length of the string, like this:

if (firstName.Length > 0)

I setup a test in the .NET Test Harness and found that the String.Empty comparison check was 370% slower than the length check. This may seem fairly trivial, but every little bit helps, right?

Class Field Initialization

If you have any classes that have a fair number of private fields, there are two different techniques you can use to initialize the class' private fields, and which one you pick can affect performance. Here is a common scenario. Let's say you have a Product class that has 45 private fields, 45 public properties that encapsulate them, and two constructors; a default constructor that will create an empty product and one that takes a DataRow that is used to populate the 45 fields. You should initialize the 45 fields somewhere because if you create an empty product, you might want to give your properties some default values. But where should you initialize the fields? Where they are declared or in default constructors?

If your code uses the technique that initializes the fields where they are declared like:

private int maxNumberAllowed = 999;

and then you create a new Product instance using the default constructor, then you have a perfectly good Product instance that you are ready to use. But what happens if you create a new Product instance using the DataRow constructor? Each field will be assigned to twice! Once when it gets declared and a second time in the DataRow constructor.

The best practice is to do all class level field initializations in the constructors. You may end up duplicating your initialization code in each constructor, but you’ll be guaranteed that each private field gets assigned to only once per class creation.

So to show what this can do to performance, I created a test with the .NET Test Harness. I created two classes, each with 50 private fields. The first class initialized its private fields where they were declared, and the second class did all initialization in the constructors. And both classes have two constructors, a default one and a DataRow one. When I created a Product instance using the DataRow constructor, the class that initialized its private fields where they were declared was 50% slower then the class that had all its initialization code in the constructors. When I created a new Product instance by calling the default constructor, both versions were relatively the same.

Using Exceptions to Control Process Flow

I’m not going to spend a lot of time going over exception handling, except to say that throwing exceptions are very costly for performance. You only want to use a try / catch block when you need to trap for an error. You never want to use it to control and direct the program’s process flow.

For example, I’ve seen this code used (and the .NET Framework actually uses this in its VB function IsNumeric):

public bool IsNumeric(string val)
{
  try
  {
     int number = int.Parse(val);
     return true;
  }
  catch
  {
     return false;
  }

What if you have an XML block that has 50 values and you want to see if all 50 are numeric? If none of the values in the XML block are numeric, then you just threw 50 exceptions needlessly!

Multithreading

Multithreading is a powerful way to get a few more requests per second out of your site. I am by no means an expert in multithreading, so I won't pretend to know enough to give you some insight into its usage. I can recommend a book by Alan Dennis called ‘.NET Multithreading’ since he does a good job describing the ins and outs of multithreading. I do want to mention a warning though. While multithreading can be very powerful and speed up your pages, if you are not extremely careful it can introduce very subtle and hard to debug problems in your code. Also, it’s commonly thought that your code can just spawn off as many threads as it needs to get the job done. But creating too many threads will actually hurt performance. Remember, there are only so many processors on your server and they are shared across all threads.

.NET Data Access Techniques

Having a disconnected data source such as DataSets is great, and the ability to make changes to your disconnected DataSet and then reconnect to your data source and sync the changes is truly amazing. But in my opinion, this should strictly be used when performance is not an issue. For read only database access, the DataReader will give you an amazing performance gain over the DataSet.

I once again used the .NET Test Harness and created two tests. The first test made a call to my database, filled a DataSet with 50 rows of 10 columns, iterated through each DataRow in the DataTable and pulled the data out of each column in the DataRow. The second test did the same thing, except with a DataReader. Since the DataAdapter uses a DataReader internally to populate the DataSet (I used the Anakrino tool to look into the internals of the DataAdapter), I assumed that it would be faster than the DataSet. But I was surprised as to how much faster the DataReader really was. Under both Debug and Release builds, the DataReader was 75% faster then the DataAdapter!

But what about the DataSet’s built in ability to update any changes to the database? True, the DataReader can’t handle this. But in my opinion, the best strategy for high speed database access, meaning read, update, insert and delete, is through the use of stored procedures. I’m not going to go into any detail in that area, since it is outside of the scope of this article. My rule of thumb is to try and avoid using DataSets altogether in web development. I generally will only use them in my WinForm applications when I’m trying to appease only one user per AppDomain.

The biggest argument I hear when telling people not to use DataSets in web development is that they are so handy for data binding to ASP.NET web controls, you can’t bind a Repeater control to a DataReader. As with DataSets, my rule of thumb for data binding is ‘don’t do it!’, and I’ll tell you why in the next section.

The final topic I wanted to cover is how to pull data out of a DataSet (if you insist on using them) or a DataReader. Both the DataRow and the DataReader have indexers to access the values out of the columns. The two most commonly used ways to get data out of these objects is to call the ToString() method after the indexer like this:

string temp = dataReader["FirstColumn"].ToString();

This is OK. But what if you are trying to get an Int32 out of the DataReader instead of a string? This is one way I’ve seen:

int temp = int.Parse(dataReader["FirstColumn"].ToString();

The other way that I see is this:

int temp = (int)dataReader["FirstColumn"];

So can you guess which one is faster? If you guessed the second one, then you’re correct. DataReaders and DataSets store their underlying data as object types. So you can cast the Int32 directly out of the DataReader. If you call the ToString() first, you are unboxing the object to a string, then calling Int32.Parse on a string, which will cast it to an Int32. This may seem fairly common sense, but I have seen code that uses the ToString() method and so I was interested in just how much faster the direct cast was. When I used the .NET Test Harness, I found the direct cast was 3 times faster! That’s a pretty huge performance gain to miss out on because of a simple casting mistake.

The one other way to get data out of a DataReader is to use its DataReader.Getxxx methods. The main reason I try to avoid using them is that they only take a numeric based column ordinal. I try to use the column names in the indexer because if you ever change around the order of the columns in your stored procedure, you won’t introduce a subtle but in your code.

Data Binding: A godsend or the devil in disguise?

When .NET was released, many web developers were ecstatic when they saw how easy it was to use server side ASP.NET controls. Slap the control on the page, set a DataSet or ArrayList to the DataBind property and away you go, instant web pages. No more looping through RecordSets in your ASP to build your HTML. This is so easy!

It is easy, but at a cost. I recently worked on a project that used data bound ASP.NET DataList and Repeater controls extensively to build their web pages. But the performance results were very disappointing. Using ACT to run some load tests on these pages, with a small number of concurrent users (5), the page performed reasonably well. But as soon as we increased the number of users to something more realistic, like 25, the page performance went to pot. So we started using PerfMon while the tests were running, and found something pretty interesting. The % time in garbage collection for the page was averaging 30%, with a maximum spike of 45%! Also, the % Processor was pegged at 95% for the entire test run. These last two statistics were big red warning lights, because they told us that not only did our pages run slow, but they were not going to be able to scale out at all. If the site had a high load of users, it would be in serious trouble.

This was no good and was not acceptable for a release to production, so we started digging into how the data bound controls worked. What we found out was that the data binding process was doing two things that were hurting performance. First, data bound controls use reflection to find the correct property and pull the data from it. Reflection is fairly costly and if you have a repeater control that is pulling 6 properties from an array of 40 objects, the performance hit can really add up.

The second thing we noticed was that the number of objects that were being created during the data binding process was pretty high (look in Anakrino at the DataGrid, DataList and Repeater class’ CreateControlHierarchy to see how it does its binding). This high number of objects being created was what was kicking the % time in garbage collection so high.

So we had to find a way to create web pages without using data binding. We tried using ASP.NET server controls and manually pushing the data, but this didn’t really change our statistics very much. Then we got desperate, and really started brainstorming. We tried placing one Literal control on each page and used a StringBuilder in the code behind’s PageLoad event to build the HTML structure for the page and then slapping the HTML into the Literal control’s text property. This technique performed amazingly well and the % time in garbage collection went down to almost nothing. But the maintainability of the HTML would have been a nightmare.

We then decided to try mixing ASP.NET code behind with an ASP style of HTML building. We created and populated all our data objects, as well as put any business logic the page needed in the aspx’s code behind PageLoad event. Then in the aspx file, we went back to ASP style HTML building, using old fashioned <%=(C# code)%> to actually inset data from our data objects into the HTML. This technique performed just as good as the string builder technique, but the maintainability of the code was much better.

The only problem with this ASP style of HTML rendering is that you are back to writing the HTML to a forward only stream, just like ASP. When using ASP.NET controls, you can update the value of any control, during any stage in the code. But web developers have been doing this since the beginning of ASP, so in extreme situations where ASP.NET controls just won’t perform, then this is a workable option.

Once we had our test page coded this way, I ran ACT on the data bound version and the new ASP style version, and compared the results. During a 15 minute run with 10 users, the ASP style page was able to run iterations as many times as the data bound version. The average requests per second jumped from 72.55 to 152.44 requests per second. The average time to last byte went from 21.79 milliseconds down to an amazing 2.57 milliseconds! But the best statistics came from the % time in garbage collection and processor %. The average % time in garbage collection went from 30% down to .79%, and the average processor % went from 95% down to 10%! This meant that our ASP style pages would scale out to a higher number of users with very little trouble.

Conclusion

Some of the performance test results I talked about in this article truly amazed me when I first saw them. But does that mean you should implement everything I talked in this article? Nope. Take data binding ASP.NET server controls, for example. Should you ban them from your site from now on? I hope not. I think they are great and serve a good purpose. What you need to do is decide how important a page is to the success of your site. With many sites, 90% of the pages requested are only 10% of the pages the site actually contains. Parsing your IIS logs is a good way to see how often each page is called, and in turn, deciding how vital that page is to your site's success. Performance tuning your code usually comes at the cost of increased complexity and loss of maintainability. You should weigh the benefits of any possible performance change to each page. In the project I was just talking about, we decided to drop ASP.NET controls and rewrite only the 8 most requested pages, but left other 80+ pages the way they were, using data bound controls.

The secret to developing a site that screams is to get yourself a good set of performance analysis tools and really learn how to use them. Profile and analyze what your code is doing, and adjust accordingly. Use Anakrino and dig into the code of the classes you are using, and learn what is really going on in the background. Try prototyping several different ways to do any given functionality and see which one works the fastest. You’ll write more efficient code and gain a deeper understanding of .NET along the way.