Many computer systems nowadays have their "correctness" checked using sample testing: a test engineer programs a scenario and runs the test system against it. The system's actual output is captured and measured against the expected output, and if these match up, then the test is passed.
But for highly critical (and complex) systems such as satellites, rockets, railway signals, onboard flight controllers, and flood-control barriers, this is just not enough. They are not much noticed, but have a huge impact on daily life, especially when they start failing. Imagine space rockets crashing from the sky, or trains running into each other because of small bugs in the system. Nobody wants that; failures are too costly in terms of lives and money. Lots of engineers specializing in quality assurance, safety, dependability, reliability, and performability, put every effort into prevent them from happening, but it's not an easy task. The kinds of systems they analyze so thoroughly are generally highly multi-threaded or event driven, and it is fiendishly difficult to test them exhaustively for correctness, because of the huge number of ways they can respond.
In order to test all the possible responses of a given system, the scientific community has, for the last few decades, been investigating model checking.
Model checking tools explore all scenarios and verify whether responses meet expectations in each instance. These techniques have been successfully applied to verify NASA's Mars Rover, Lucent Technologies' PathStar telephone switch and the storm-surge barrier control system in the Dutch Delta Works dikes. So far, so good. But past model checking techniques required engineers to create and verify a model of the system, when it is the system itself which ultimately needs to be functioning correctly. So that is why, in the past decade, people from NASA started to apply model checking techniques at a system level.
With that in mind, a team comprising me, my predecessor, and our supervisor, Theo Ruys, developed the MoonWalker model checker at the University of Twente. MoonWalker verifies multithreaded .NET programs by inspecting the CIL bytecode instructions and exploring all possible interleavings of the threads. It is also the only publicly available .NET model checker to date that employs a set of efficient algorithms and is, in terms of performance, on a par with other state-of-the-art tools. You can read all about it in Software Model Checking for Mono, which contains an introductory chapter on model checking and an in-depth explanation of the algorithms out there.
Figure 1. Conceptual overview of using MoonWalker. The user loads up a .NET assembly and the assertions specified within it to the MoonWalker model checker, which explores the state space of the assembly. In case no assertion violations are found, OK is outputted. Otherwise a trace of CIL instructions that leads to the assertion violation is generated.
Efficient model checking is all about employing the best algorithms, or else you easily end up hitting the limits of your computer's processing power. We've seen many case studies where model checkers completely consume clusters with dozens of GBs of memory and dozens of processing cores. It's a generally known problem in the model checking community, and it drives us and our colleagues around the world to eagerly develop better algorithms.
So, we recently spent months working out three new algorithms for MoonWalker; theoretically they looked much better, but we wanted to see whether they would really speed up MoonWalker in practice. Naturally, we needed to implement them as efficiently as possible to squeeze the most out of them, but this is not easily done in a tool as complex as a model checker, which is constructed with some of the most advanced algorithms around. So we made heavy use of a profiler to understand bottlenecks and aggressively optimize the algorithms.
Measure First, Optimize Later
When developing MoonWalker, we read some interesting papers on hashing in model checking, and recalled from our experiences that the hash function is called up to billions of times during an average verification. Here was an opportunity to devise a faster hash function! Based on cyclic polynomials and Knuth hashing with a constant time-complexity, this would be much faster than the linear time-complexity of traditional algorithms. And so, with growing excitement, this new algorithm was worked out on paper and called incremental hashing. When we proved that the theory behind it was sound, we enthusiastically implemented it in MoonWalker.
The results of the first verifications runs were not what we expected. We used small .NET models, and simply used the cygwin time command to measure the execution times. There was no decrease and we initially thought this was because of the simplicity of our models. We crafted a few more complex ones and reran the verifications – also with no decrease in execution time. This was puzzling and disappointing because, in theory, our incremental hashing should have been much faster.
Figure 2. On the left is a simple datarace programmed in C#. The MoonWalker model checker (in the command prompt on the left) successfully detects the datarace.
We started wondering whether the .NET virtual machine might be able to optimize the old algorithm to the point where it was as efficient as the new one. So the old hashing algorithm was blown up by adding some unoptimizable loops, but this also didn't give us decreases in execution time.
Completely stumped, we started to Google for answers, specifically trying to understand how the compiler and the CPU optimized the instruction code. We even went through the optimized CIL bytecode step by step, using the debugger built in to Visual Studio.
In a flash of insight, we realized we should measure the cumulative time spent in the old hash function and the incremental hash function, and look for an absolute difference.
We immediately began googling for .NET profilers and found several. We finally settled on the ANTS Profiler because it shows the profiling results in a way that gives a quick overview and is fast to comprehend. Using the profiler in detailed mode, we found that the absolute amount of time spent in the old hash function was much higher than the incremental hash function. Multiple times higher, in fact. So the big question was:
Why wasn't this difference reflected in the total running time?
When in detailed mode, ANTS can show a piece of the source code with indications of the amount of time spent on each line. We analyzed both the old hashing function and the incremental hashing function in this incredibly fine-grained way… and still couldn't explain why there wasn't a difference!
Then old optimization wisdom came to mind – stuff that we'd read years back in an old "optimizing Java code" article:
We should only optimize the bottlenecks, which are measured by their relative stake in the total execution time.
A quick look at the table showed that the old hashing algorithm had a relative stake of near zero, so plugging in a better hashing algorithm would have no effect on the total execution time. We also realized that the absolute time difference, while still big, was in milliseconds. So what we had spent all that time and energy doing was optimizing a part of MoonWalker that didn't need it.
Fortunately, the merits of our incremental hashing did not have to be limited to a neat theoretic result. We profiled another model checker developed at NASA – the SPIN model checker – and observed that their hashing was a bottleneck. We implemented our incremental algorithm there, and immediately saw up to a two-fold performance increase, depending on the model. Our algorithm and extensive experimental results have also recently been published at SPIN'08 (a well-known model checking conference).
Also, while profiling and measuring the effects of hashing in MoonWalker, we did detect that its garbage collector algorithm (not to be confused with the garbage collector in the virtual machine) had a far more significant stake – on average, 55%. We saw this as a great optimization opportunity and also worked out a new algorithm to reduce this stake.