Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Benchmarking Lessons Learned the Hard Way

, 29 Mar 2011
Rate this:
Please Sign up or sign in to vote.
Having spent far too much time benchmarking different code recently, here are some lessons learned!

Benchmark Lessons Learned

Having spent far too much time benchmarking different code recently, here are some lessons learned! In no particular order:

  1. Make sure all power saving systems on your machine are turned off: The graph shows a benchmark running in a single process. The process is running the same calculation in JVM COBOL over and over again. Although the process was occupying a single core of the machine at near 100%, this only represented at 50% load on the CPU as a whole. The OS was not clever enough to figure this out, and so it kept scaling back the CPU clock speed. When some spurious IO operation or other activity caused the CPU usage to drop further, the scaling cut in even more. This trashes the benchmark results, especially when comparing dissimilar software technologies with different CPU load characteristics.
  2. Make sure nothing else is running on your machine: OK, that is pretty much impossible on a modern machine. However, I shudder to think how many times a benchmark has gone horribly wrong for me because Outlook chose 'that very moment' to update all its folders. A nasty one to check for is your web browser. Flash animations and AJAX pages can randomly consume lots of CPU or disk and you'd never know it. Check using 'top' or process explorer (*nix or Windows).
  3. Test real scenarios: It is very easy to assume a simple loop will be enough to measure the performance of a language or technology. It is not! Most modern compilers have optimizers which specifically target loops. Simple loops get optimized massively or optimized away altogether. However, the more complex loops which occur in real life do not get optimized away. So, a simple loop can produce results which are in no way similar to real world performance.
  4. Here is a COBOL program with a simple loop:

    123456$set sourceformat(variable)
    
       01 my-group.
           03 counter pic s9(9) comp-5.
           03 a       pic s9(9) comp-5.
           03 b       pic s9(9) comp-5.
           03 r       pic s9(9) comp-5.
    
       move 123456789 to a b r
       perform varying counter from 1 by 1 
                   until counter = 1000000
            compute r = (a + b) / (a - b)
            compute r = (r + b) / (a - b)
            compute r = (r + b) / (a - b)
            compute r = (r + b) / (a - b)
            compute r = (r + b) / (a - b)
       end-perform
       .

    We can see that a and b are invariant and that r is the loop constant (it is the same after 1 or 1000 iterations). Further, the counter has a fixed value at the end of the loop which can be deduced at compile time. In my post on getting to understand JVM performance, I showed a benchmarking technique using JavaScript. Here are the results for 64 bit Windows with 64 bit COBOL (with the opt option) and a 64 bit Sun/Oracle JVM:

    Launch Overhead:
    
    Results:
    =========
    JVM
        Maximum Time: 250
        Minimum Time: 0
        Mean    Time: 7.875
        Total   Time: 252
    Native
        Maximum Time: 112
        Minimum Time: 48
        Mean    Time: 53.28125
        Total   Time: 1705
     
    Simple Loop:
    
    Results:
    =========
    JVM
        Maximum Time: 283
        Minimum Time: 78
        Mean    Time: 86.59375
        Total   Time: 2771
    Native
        Maximum Time: 58
        Minimum Time: 49
        Mean    Time: 50.4375
        Total   Time: 1614

    Launch overhead being the performance when COBOL does nothing at all (a program which says 'goback.'). It is clear from the results that for native Micro Focus, the difference between doing nothing at all and performing the loop is less than the error in the benchmarking technique. The variation is probably due to other processes on the machine using CPU and memory. This is an impressive bit of optimization from the Micro Focus native code generator.

    JVM COBOL does not do as well. It has trivial launch overhead, so in this test, it is tens if not hundreds of times slower than the native COBOL. Is this result realistic of real world programs? We can show that it is not, because constant value loops do not occur very much in production code (why have a loop which does nothing?). The JVM COBOL does not optimize away this loop.

    What happens if the loop is less predictable? Will the difference in therelative performance of JVM and native change?

    Now we can look at a slightly more complex program:

    123456$set sourceformat(variable)
     
           01 my-group.
               03 counter pic s9(9) comp-5.
               03 a       pic s9(9) comp-5.
               03 b       pic s9(9) comp-5.
               03 r       pic s9(9) comp-5.
           01 resuts.
               03 rr      pic x(40).
               03 rc      pic x(40).
     
           move 123456788 to b
           move 100       to r
           perform varying counter from 1 by 1 
                    until counter = 1000000
                move 123456789 to a
                compute r = counter     / 100
                compute r = counter - r * 100
                compute a = a + r
                compute r = (a + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
                if r = 0
                    exit perform            
                end-if
           end-perform
           move r       to rr
           move counter to rc
           .

    In this program, a and r are no longer loop constants. Also, the loop can exit before counter = 1000000. This means that the optimizers can no longer optimize away the loop. This sort of complex branching logic is much more typical of the way business logic runs in real programs.

    Here is the result:

    Results:
    =========
    JVM
        Maximum Time: 449
        Minimum Time: 189
        Mean    Time: 202.9375
        Total   Time: 6494
    Native
        Maximum Time: 167
        Minimum Time: 145
        Mean    Time: 148.8125
        Total   Time: 4762

    Accounting for a launch overhead of 1.7 seconds, this benchmark shows native COBOL running only some 2.0 times faster than JVM COBOL.

    From this, it is abundantly clear that simple loops are in no way appropriate for benchmarking!

  5. Pick representative hardware: Do not benchmark a database application on a machine with one SATA drive and expect that to resemble running on a machine with a serial SCSI RAID! Even the performance using the x86 and x64_86 instruction set on the same machine can be radically different.
  6. Below are the results for the benchmark I discussed above, but this time using 32 bit x86 code on the 64 bit machine:

    Results:
    =========
    JVM
        Maximum Time: 467
        Minimum Time: 204
        Mean    Time: 219.125
        Total   Time: 7012
    Native
        Maximum Time: 247
        Minimum Time: 216
        Mean    Time: 224.15625
        Total   Time: 7173

    This time the launch overhead is lower as well (1.4 seconds for native and 0.2 seconds for JVM). So we have 5.7 seconds for native and 6.8 for JVM COBOL, meaning on 32 bit the native COBOL is running only 1.2 times quicker than JVM, which is substantially different from the 64 bit results on the same exact machine.

  7. Know your enemy, spend time with it.
  8. For discussion on this topic and others, please visit my personal site!

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License

Share

About the Author

alex turner
Web Developer
United Kingdom United Kingdom
I am now a Software Systems Developer - Senior Principal at Micro Focus Plc. I am honoured to work in a team developing new compiler and runtime technology for Micro Focus.
 
My past includes a Ph.D. in computational quantum mechanics, software consultancy and several/various software development and architecture positions.
 
For more - see
 
blog: http://nerds-central.blogspot.com
 
twitter: http://twitter.com/alexturner

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web04 | 2.8.140916.1 | Last Updated 29 Mar 2011
Article Copyright 2011 by alex turner
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid