Click here to Skip to main content
15,885,906 members
Articles / Programming Languages / C++
Article

KISS code block execution-speed benchmark

Rate me:
Please Sign up or sign in to vote.
4.40/5 (5 votes)
31 May 2006 55K   379   27   12
A utility for "Keep It Simple Stupid" code block execution-speed benchmarking.

Motivation

"Yesterday I was working on some code, that I think is time critical. So I wrote myself a file, a really little test project, to visualize the elapsed CPU cycles (of my life) that passed while executing the mentioned code block. As the work progressed I also wanted to see the differences, measured in cycles, between single versions, or in other words how good or bad I was in optimizing it. I wasn't quite finished with my work when I suddenly saw, that the little speed testing utility could be something, that I might share." I wrote this text some time ago. Since then my benchmark evolved...

Description

Flow Chart

Usage

  1. Create new empty console project.
  2. Add benchmark.h and benchmark.cpp into the project.
  3. Add an empty .cpp file into the project. Your .cpp file should have following layout:
    #include <benchmark.h>
    
    // put all your include files here together with globals
    
    START_BENCHMARK
    // expands to int main(int, char**)
    //      - don't define your own  !!!
    
        //-------------------------------------------------------------
        // repeat following code for each test you wish to run
        //-------------------------------------------------------------
    
        // below are helper macros for safe access to test parameters 
        // each parameter once set is valid for all consecutive tests 
        // until changed by a call to it`s appropriate setter 
        // parameters are set in this order:
        // 1. default values - "Generic Test" in dirty environment
        //           with 5 chaching and 10,000 testing iterations
        // 2. command line arguments - override default values
        //           (see below for possible switches)
        // 3. macro setters
    
        //sets test description - may be omitted
        SET_DESCRIPTION(formatting string)
        //sets number of chaching iterations - may be omitted
        SET_CACHING_ITERATIONS(integer)
        //sets number of testing iterations - may be omitted
        SET_TESTING_ITERATIONS(integer)
        //sets clean environment on or off - may be omitted
        SET_ENVIRONMENT(boolean)
    
        // use following macros to encapsulate chaching
        // of tested code - may be omitted
        BEGIN_CACHING
        // put both tested code blocks here
        FINISH_CACHING
    
        // this is the actual test
        EXEBLOCK_A
        // put first tested code block here
        EXEBLOCK_B
        // put second tested code block here
        EVALUATE
    
        //-------------------------------------------------------------
        // end of a single test
        //-------------------------------------------------------------
    
    CLOSE_BENCHMARK
  4. Compile the project and run the benchmark:
    <appname> [[-f name | -c state | -a number | -i number] | [-h]]
    
         f - specify log file 'name', if you omit this parameter
             the default name 'benchmark.log' will be used
         c - runs the tests in 'state' 0 (dirty) or 1 (clean)
             if you have hard coded the 'states' of your tests,
             this parameter has no effect
         a - sets the global 'number' of caching iterations,
             if you have hard coded this value for your tests,
             this parameter has no effect
         i - sets the global 'number' of testing iterations,
             if you have hard coded this value for your tests,
             this parameter has no effect
         h - shows this help

Last Words

That's all. For further info on this topic take a look at the:

  • Approximate Math Library for Intel® Streaming SIMD Extensions.
  • How to optimize for the Pentium family of microprocessors, By Agner Fog, Ph.D. Copyright © 1996 - 2004.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Slovakia Slovakia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralNeed some help! Pin
diduke1-Dec-05 17:26
diduke1-Dec-05 17:26 
GeneralRe: Need some help! Pin
oto spal4-Dec-05 9:19
oto spal4-Dec-05 9:19 
GeneralRe: Need some help! Pin
diduke5-Dec-05 0:17
diduke5-Dec-05 0:17 
GeneralRe: Need some help! Pin
oto spal6-Dec-05 11:29
oto spal6-Dec-05 11:29 
GeneralRe: Need some help! Pin
diduke7-Dec-05 4:33
diduke7-Dec-05 4:33 
GeneralRe: Need some help! Pin
oto spal12-Dec-05 2:09
oto spal12-Dec-05 2:09 
GeneralRe: Need some help! Pin
diduke12-Dec-05 2:36
diduke12-Dec-05 2:36 
GeneralRe: Need some help! Pin
oto spal12-Dec-05 12:07
oto spal12-Dec-05 12:07 
QuestionRDTSC isn't a great choice nowadays why not use QPC? Pin
Hal Angseesing16-Nov-05 3:39
professionalHal Angseesing16-Nov-05 3:39 
AnswerRe: RDTSC isn't a great choice nowadays why not use QPC? Pin
oto spal16-Nov-05 14:18
oto spal16-Nov-05 14:18 
Hal Angseesing wrote:
2) Dual core CPUs do not guarantee synchonisation between cycle counters (this is already an issue with certain games).


since i don`t work on dual core cpu, i had no possibility to test it (that`s why it is missing) but i think that something like SetProcessAffinityMask(current_process, 1) should satisfy such a system.

Hal Angseesing wrote:
On the whole I would recommend QueryPerformanceCounter as a first stab which will use dedicated mobo timing devices when available and will fall back to RDTSC when not. QPC is supposed to be multi-core aware and driver bugs aside should code with 2 nicely.


my intent was to measure the processor clock cycles.

Hal Angseesing wrote:
3) CPU frequency is not fixed! (P4M for example but this is also being used more on desktop systems).


IA-32 Intel(R) Architecture Software Developer’s Manual, Volume 3: System Programming Guide:

"...Members of the processor families increment the time-stamp counter differently:

- For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel(R)SpeedStep(R) technology transitions may also impact the processor clock.

- For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the frequency at which the processor is booted. The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward..."


if (xx + xy) % 2 return xx else return xy
GeneralGood article but.. Pin
Robert Buldoc12-Nov-05 14:08
Robert Buldoc12-Nov-05 14:08 
GeneralRe: Good article but.. Pin
oto spal13-Nov-05 15:57
oto spal13-Nov-05 15:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.