Click here to Skip to main content
13,046,087 members (56,777 online)
Click here to Skip to main content
Add your own
alternative version


27 bookmarked
Posted 11 Nov 2005

KISS code block execution-speed benchmark

, 31 May 2006
Rate this:
Please Sign up or sign in to vote.
A utility for "Keep It Simple Stupid" code block execution-speed benchmarking.


"Yesterday I was working on some code, that I think is time critical. So I wrote myself a file, a really little test project, to visualize the elapsed CPU cycles (of my life) that passed while executing the mentioned code block. As the work progressed I also wanted to see the differences, measured in cycles, between single versions, or in other words how good or bad I was in optimizing it. I wasn't quite finished with my work when I suddenly saw, that the little speed testing utility could be something, that I might share." I wrote this text some time ago. Since then my benchmark evolved...


Flow Chart


  1. Create new empty console project.
  2. Add benchmark.h and benchmark.cpp into the project.
  3. Add an empty .cpp file into the project. Your .cpp file should have following layout:
    #include <benchmark.h>
    // put all your include files here together with globals
    // expands to int main(int, char**)
    //      - don't define your own  !!!
        // repeat following code for each test you wish to run
        // below are helper macros for safe access to test parameters 
        // each parameter once set is valid for all consecutive tests 
        // until changed by a call to it`s appropriate setter 
        // parameters are set in this order:
        // 1. default values - "Generic Test" in dirty environment
        //           with 5 chaching and 10,000 testing iterations
        // 2. command line arguments - override default values
        //           (see below for possible switches)
        // 3. macro setters
        //sets test description - may be omitted
        SET_DESCRIPTION(formatting string)
        //sets number of chaching iterations - may be omitted
        //sets number of testing iterations - may be omitted
        //sets clean environment on or off - may be omitted
        // use following macros to encapsulate chaching
        // of tested code - may be omitted
        // put both tested code blocks here
        // this is the actual test
        // put first tested code block here
        // put second tested code block here
        // end of a single test
  4. Compile the project and run the benchmark:
    <appname> [[-f name | -c state | -a number | -i number] | [-h]]
         f - specify log file 'name', if you omit this parameter
             the default name 'benchmark.log' will be used
         c - runs the tests in 'state' 0 (dirty) or 1 (clean)
             if you have hard coded the 'states' of your tests,
             this parameter has no effect
         a - sets the global 'number' of caching iterations,
             if you have hard coded this value for your tests,
             this parameter has no effect
         i - sets the global 'number' of testing iterations,
             if you have hard coded this value for your tests,
             this parameter has no effect
         h - shows this help

Last Words

That's all. For further info on this topic take a look at the:

  • Approximate Math Library for Intel® Streaming SIMD Extensions.
  • How to optimize for the Pentium family of microprocessors, By Agner Fog, Ph.D. Copyright © 1996 - 2004.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

oto spal
Slovakia Slovakia
No Biography provided

You may also be interested in...


Comments and Discussions

GeneralNeed some help! Pin
diduke1-Dec-05 17:26
memberdiduke1-Dec-05 17:26 
GeneralRe: Need some help! Pin
oto spal4-Dec-05 9:19
memberoto spal4-Dec-05 9:19 
GeneralRe: Need some help! Pin
diduke5-Dec-05 0:17
memberdiduke5-Dec-05 0:17 
GeneralRe: Need some help! Pin
oto spal6-Dec-05 11:29
memberoto spal6-Dec-05 11:29 
GeneralRe: Need some help! Pin
diduke7-Dec-05 4:33
memberdiduke7-Dec-05 4:33 
GeneralRe: Need some help! Pin
oto spal12-Dec-05 2:09
memberoto spal12-Dec-05 2:09 
GeneralRe: Need some help! Pin
diduke12-Dec-05 2:36
memberdiduke12-Dec-05 2:36 
GeneralRe: Need some help! Pin
oto spal12-Dec-05 12:07
memberoto spal12-Dec-05 12:07 
QuestionRDTSC isn't a great choice nowadays why not use QPC? Pin
Hal Angseesing16-Nov-05 3:39
memberHal Angseesing16-Nov-05 3:39 
AnswerRe: RDTSC isn't a great choice nowadays why not use QPC? Pin
oto spal16-Nov-05 14:18
memberoto spal16-Nov-05 14:18 
Hal Angseesing wrote:
2) Dual core CPUs do not guarantee synchonisation between cycle counters (this is already an issue with certain games).

since i don`t work on dual core cpu, i had no possibility to test it (that`s why it is missing) but i think that something like SetProcessAffinityMask(current_process, 1) should satisfy such a system.

Hal Angseesing wrote:
On the whole I would recommend QueryPerformanceCounter as a first stab which will use dedicated mobo timing devices when available and will fall back to RDTSC when not. QPC is supposed to be multi-core aware and driver bugs aside should code with 2 nicely.

my intent was to measure the processor clock cycles.

Hal Angseesing wrote:
3) CPU frequency is not fixed! (P4M for example but this is also being used more on desktop systems).

IA-32 Intel(R) Architecture Software Developer’s Manual, Volume 3: System Programming Guide:

"...Members of the processor families increment the time-stamp counter differently:

- For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel(R)SpeedStep(R) technology transitions may also impact the processor clock.

- For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the frequency at which the processor is booted. The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward..."

if (xx + xy) % 2 return xx else return xy
GeneralGood article but.. Pin
Robert Buldoc12-Nov-05 14:08
memberRobert Buldoc12-Nov-05 14:08 
GeneralRe: Good article but.. Pin
oto spal13-Nov-05 15:57
memberoto spal13-Nov-05 15:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170713.1 | Last Updated 31 May 2006
Article Copyright 2005 by oto spal
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid