Although there may be layers of humility obscuring the fact, most professional developers think, at their core, that on a scale of one to ten they are pretty awesome. And why wouldn’t they? Developers tend to have two paths to travel in their career. One direction leads towards a painful realization of syntactic incompetence where every day is filled with the fear of someone discovering your ineptitude. Naturally, this results in the subsequent search for a sales or marketing career. In the opposite direction lies a conquest of projects that boosts your confidence and ego until you firmly believe that your skills range somewhere between the epic and the immortal.
But if your code were laid bare, torn down to the nth degree and analyzed, would that feeling of immortal skill be spared? It is certainly a question worth asking, and ANTS Performance Profiler from Red Gate Software can help determine the answer. A while back, I wrote a cryptography library to simplify encryption and hashing operations. I consider it good solid code, and I’ve used it on a number of projects without any noticeable issues -- performance or otherwise. In the sections that follow, however, I’m going to run it through ANTS Performance Profiler to see if it can spot anything that my immortally epic awesomeness missed.
Installing ANTS Performance Profiler
In an effort to provide the best overview of ANTS Performance Profiler that I can, I shall begin with an in depth overview of my installation experience. I downloaded a copy of the software from the Red Gate site. Then I installed it. No preparation. No configuration. No frustration. If you’re relatively accurate with a mouse and can click “Next” four times then your experience should be about the same. Why four times? Well, there is a EULA you have to accept, and you do have the option to change installation directory if you so choose.
Setting Up Your Application for Profiling
When you fire up ANTS Performance Profiler, you are greeted with the Application Settings Screen (see Image-01). In the grand scale of things, this is a pretty easy screen to understand. First, you choose which type of application you want to profile, and then you enter some configuration settings at the bottom of the screen. Naturally, the configuration settings differ for each application type; so for a desktop application you may point to an executable, whereas a web application requires you to enter the URL of the application. Notice that I did say application. ANTS Performance Profiler analyzes running code, which means that you need to point it at something it can run. In my case I want to profile my cryptography library. Fortunately, there is a desktop application that accompanies the cryptography library that allows you to setup configuration settings, encrypt and decrypt string, and perform hashing operations, so we can run it against that application.
Image 01 – Application Settings
Regardless of the application type, you get to choose a Profiling mode and the Default timing display for the profiler. The Profiling mode contains a drop-down list that allows you to specify exactly how detailed an analysis you want to perform on your code. You can, essentially, choose whether you want line level timings or method level timings and whether you want to include all code (e.g. .NET framework and third party code), or only code for which source code is available (e.g. your code). Line level timings are the most detailed and means that statistics are gathered for every single line of code in a method. Method level timings are faster because statistics are only gathered on a method-by-method basis instead of a line-by-line basis. You may be wondering why you would ever do something other than line level timings since they provide the most details. Good question. And the answer is that you are using this application while it is gathering information, so you may notice some sluggishness when using the application during a profiling session. If you are performance testing a large application, then it may benefit you to start with method level timings, identity methods with performance issues, then go back and do line level timings on those methods to get a better idea of what lines are causing the issue.
The Default timing display drop-down gives you two options for capturing timing information, CPU and Wall clock. When you opt for CPU time, then the “amount of time” a method or line of code runs is calculated against actual CPU usage. If a thread is blocked, that means that the time spent waiting for the thread to continue is not counted towards the timing statistics. Wall clock, on the other hand, counts the entire duration of actual time methods and lines of code take to execute: This makes it a good choice if you have a multi-threaded application that is hitting problems.
Another cool feature of ANTS Performance Profiler is the integrated Performance Counters (see Image-02). Before you profile an application, you can choose any number of performance statistics to capture along the way, such as CPU usage, memory usage, thread count, exceptions thrown, pages served, etc. During the profiling session, those performance statistics are stored alongside your code statistics, and graphed out a timeline. During the analysis phase, this allows you to select a region of the timeline and see what code was running during a spike, or a lull, or whatever it is on the graph that has grabbed your attention. I’m fairly impressed with this feature, because it really lets you identify and isolate highly specific performance problems with ease.
Image 02 – Performance Counters
Profiling an Application
Once you’ve got your configuration options set, all you have to do to start profiling your application is click on the Start Profiling button. ANTS Performance Profiler launches your application, and then it’s up to you to run whatever part of your application you want profiled. If you don’t run it, then it won’t get profiled. As your application runs, ANTS continually gathers statistics about which lines of code are running and how long they take, as well as any performance counter data you selected. In my case, I wanted to test the encryption process in my library, so I just typed a string into the UI and encrypted it 10 times. When you are finished, you click Stop Profiling to finish the data collection process and move on to the analysis phase.
Analyzing the Profiler Data
Now we get to the fun part, looking at the performance results and figuring out where bottlenecks appear in code. If you take a look at Image 03, you will see the results I got after profiling my application. ANTS Performance Profiler displays a few different pieces of information once you have completed the profiling process, which are described in more detail below.
Performance Counter List – Lists of the various performance counters that you selected configuring the profiling session. Clicking on one of these performance counters selects the item from the list and bolds the performance counter’s line on the Performance Graph.
Performance Graph – Presents a visual timeline of performance counter data. You can select a region of the timeline by clicking and dragging with your mouse. Selecting a region of the timeline filters performance profiling to just the selected region.
Method Tree – Displays a tree containing the hierarchy of method calls made during the profiling session. The icon lets you identify the path of method calls where your code spent the most time executing, so that you can quickly identify where you may have a performance problem.
Source View – When you click on a method in the Method Tree, the Source View displays the source code (if available) for that particular method. This view also contains statistical information about each line of code (if you did line level data gathering) like the number of times the line of code was hit, the total time spent executing, and the average execution time. Near the scroll bar of the Source View window, there are Hot Spot Indicators that correspond to lines of code that ANTS Performance Profiler has identified as being potential performance problems.
Now that we’ve got a bit of an idea of what’s on the screen, let’s talk about what it means. First, take a look at the performance graph. My profiling session lasted about 1 minute and 40 seconds, and the red line indicates the amount of processor being eaten up during that time period. Please note that performance counters are not magically limited to your application, so if you have something running in the background that causes the processor to spike, it will show up here. Fortunately, this was the only major thing running on my system at the time, so we should be good. Notice that we see processor usage spike at the beginning of the profiling session, then it goes flat, and then rises again at the end of the session. That first spike is the application starting up, and the last spike was when I typed Test into my application and clicked Encrypt a bunch of times.
Currently the method tree is telling me that my path is the constructor for the main window of my application. I’m running a WPF application inside a VPC, so this result is not really all that surprising. And there’s not really much I can do about it because there’s no code that runs during startup, it’s just XAML being parsed and constructed into a UI. By default, ANTS Performance Profiler starts out using all of the statistics gathered during the entire profiling session. In this case, however, I want to focus on the encryption routine that ran towards the end of the profiling session and disregard the startup statistics at the beginning. So what I’m going to do is select a region on the Performance Graph to limit the profiling data to the code that ran during the latter portion of my profiling session. When selected, it looks something like this:
After selecting a region of the timeline, the Method Tree updates, and a new path appears. As mentioned before, my cryptography library is a wrapper that simplifies the existing .NET cryptography library, so I was hoping the new path would dead-end into a .NET Framework function. Why? Because it would imply my wrapper was light weight enough to not cause much overhead and that all the heavy lifting is occurring deep in the .NET cryptography libraries where I can make no optimizations (in other words I could blame Microsoft for it being slow). Unfortunately, a quick look reveals that the path runs right through the
Rebel.Cryptography.CryptoSettings.LoadDefaults method, which is a bit odd because all this is supposed to be doing is loading up some standard default settings. Clicking on that method in the method tree reveals the Source View shown below:
When you look at the Source View, there are a couple of columns to know about
Line – Contains the line number of the source code. If you want to know what source code file you are looking at, the file name is displayed in the title of the Source View tool window.
Hit Count – Shows the number of times the line of code was executed. I clicked “Encrypt” 20 times in rapid succession during the performance profiling, which is why all of these lines say 20.
Avg. Time (ms) – Average amount of time the line of code took to run. You have a couple of time unit options for this value -- percentage, ticks, milliseconds, and seconds. In this case I’m displaying things in milliseconds because it makes it easy to compare numbers. You can switch the unit via the Timeline menu whenever you want, so you can change it on the fly and use the unit that makes the most sense for whatever you are analyzing at the time.
Time (ms) – Total amount of time the line of code ran.
Now that the source code and performance statistics are in front of us, it’s time to play a game called identify the crappy line of code. There are seven lines of code that have performance statistics. Two of those lines are curly braces, which eat up 0.017 milliseconds of processor time each time they are encountered. I really don’t know how to performance optimize curly braces, so we’ll glaze over those for the time being. The other five lines of code are all basically doing the same thing, setting default property values. But a cursory glance reveals that the last line of code in the method (line 278) takes almost 1000 milliseconds to execute, while everything else takes either 14 milliseconds or 2 milliseconds. I know the setter on line 273, which runs in 14 milliseconds, contains additional logic for clearing some other dependent properties, which is why it runs longer than the other default setters. In other words, it makes sense that it would take a bit longer. But the 955 milliseconds incurred for line 278 seems a bit odd.
Now that I’ve identified the poorly performing line of code, it’s time to figure out why it’s performing poorly. There are two things that are occurring on this line. First, the system gets the system name by calling
System.NET.Dns.GetHostName, and then it assigns the returned string value to the Password property. The sum total for both is shown in the statistics for this line of code, but I want to know which one of these is the real culprit consuming most of the processor time. How do you do it? Easy, when you move your mouse over each individual statement on a single line of code, a callout appears showing you the specifics for that one statement. When I hover over password it says it takes 185 milliseconds to process. I happen to know that the Password property’s setter contains a good chunk of code for setting up an initialization vector and encryption key based on the set password, so it makes sense that it’s higher than the other setters. And when I hover over the GetHostName method, it reveals that this call is consuming 750 milliseconds of processor time. Ouch.
I also could have figured out the individual timings using the Call Graph, which builds a visual representation of your code. If you look at the call graph for
CryptoSettings.LoadDefault (which you can see by right clicking on the method in the method tree and selecting Create New Call Graph) you can see that the setter for the Password property takes 185.506 milliseconds, which represents 19% of the total time
LoadDefault takes to run. If you hover over the small yellow box to the left of the Password setter you’ll see the call to the
GetHostName method, and the small yellow box to the right represents .NET JIT Overhead. .NET JIT Overhead tends to be fairly negligible, so although you can’t see the statistics directly, you can conclude that the call to
GetHostName is taking up the other 80% of this method call.
Resolving the Issue
System.NET.Dns.GetHostName is a Microsoft method, which means I can’t optimize its performance. Although my original objective was to trace my longest path back to a Microsoft method, this was not really what I had in mind. In fact, when I first saw that I was making a call to the GetHostName method, the first thing that ran across my mind was … WTF? This is an encryption library that has nothing to do with networking, so why does the host name matter? Then I remembered why I did it. My cryptography library reads setting information from a configuration section in the application’s configuration file, one of which is a password used to populate an initialization vector and encryption key used in .NET cryptography operations. As I was writing the library, however, I remember thinking that I didn’t want to force people to configure the library in order to use it. For example, if someone was just throwing together a demo site and needed the library, I didn’t want them to have to look up the configuration information in order to use it. So I needed a “standard” password that would be used whenever the configuration wasn’t available. But the idea of a hardcoded password bothered me, so I figured I would just use the host name of the system, which is why the code has a call to
GetHostName. That way it would change from system to system, but would still be fairly easy to figure out if someone came back and said, hey, what’s the encryption password?
Of course, I certainly did not realize that a call to
GetHostName would have such a performance impact. Now, you may say that 750 milliseconds is not a big deal in terms of performance, but when you switch from milliseconds as the time unit back to percentage, you quickly find out that 750 milliseconds represents about 10% of the total processing time for the region of the timeline I selected. In other words, removing this one call and replacing it with a hardcoded value, or throwing an exception when the configuration isn’t found, can increase performance by about 10%.
Apart from writing about my experience, the act of profiling my code and finding a way to boost the performance of my cryptography library by 10% took all about twenty minutes. And eight and half of those minutes were spent watching a video overview of ANTS Performance Profiler. And even though my ego may have been a bit bruised by the fact ANTS Performance Profiler found some issues with my code, the fact that my code is even better now makes me all the more proud.