Click here to Skip to main content
Licence CPOL
First Posted 29 May 2008
Views 15,804
Downloads 131
Bookmarked 18 times

Processor BenchMark Utility

By | 29 May 2008 | Article
Measuring Processor Performance

Introduction

This article discusses a processor benchmark utility. The results are surprising. Just as everything else (these days) breaks our preconceptions, so do processors.

Just the other day, I received an email about 21st century's broken preconceptions. It went something like this " .... the tallest basketball player is Chinese ... the most popular rapper is white, and the best golf player is ....". So there is little or no surprise to see a processor benchmark that shatters preconceptions.

Background

I was always curious about how the new generation processors measure up. How fast are they? Not just a number that represents the processor as a whole, but a real value that represents an honest measure of how many operations of a given kind the processor can do in a given time. For instance, how many integer additions a second. How many double divides a second? ....

debugbench_s.jpg

Using the Code

The code is developed to use macros as the instrument of simplification. This way, the macro parameter can signify any desired benchmarking operation. The macro TIMEDOP (timed operation) is called with two parameters:

  1. The name of the function to generate
  2. The code to execute in the timing loop
//
// The timing loop generation macros. You may roll your own by changing the second
// argument to the macro to a piece of code. 
// If you use variables in the code snippet, make sure you declare them first.
//
// Here is an example of tinkering between add and increment:
//
//    TIMEDOP(do_int,        count2++   );
//    TIMEDOP(do_int,        count2 +=3 );
//
// Note the missing (or optional) semicolon (;) at the end of the snippets.

TIMEDOP(do_int,        count2++);
TIMEDOP(do_mul,        count *= 33);    // I chose the constants randomly
TIMEDOP(do_div,        count /= 13);    // they have no influence on the timing
TIMEDOP(do_sub,        count -= 10);
TIMEDOP(do_mod,        count %= 13);
TIMEDOP(do_str,        memcpy(str, str2, 1024));
TIMEDOP(do_str2,       memcpy(str, str2, 1));
TIMEDOP(do_dbladd,     dop2 += dop1; dop1+=3);
TIMEDOP(do_dbladd3,    dop2 = dop1 + dop3; dop1+= 2);
TIMEDOP(do_dbldiv,     dop2 /= 103);
TIMEDOP(do_dblsin,     dop2 = sin(dop1); );
TIMEDOP(do_func,       noop(count2));
TIMEDOP(do_cos,        cos(dop1););
TIMEDOP(do_tan,        tan(dop1););
TIMEDOP(do_sqrt,       sqrt(dop1););

// After declaring, one may call the function as follows:

int count = do_int();  // count receives the count of instructions / 100

// To make sure the time elapsed is accurate, the performance counter is 
// queried twice, as to measure how long it takes to query the performance counter.

QueryPerformanceCounter(&PerformanceCount);\
double dd = largeuint2double(PerformanceCount);\

QueryPerformanceCounter(&PerformanceCount3);\
double skew = largeuint2double(PerformanceCount3) - dd;\
         
// Then, in the main loop, every time the performance counter is
// queried, the code extends the expected time by the time it 
// took the performance counter to respond

... in the while loop ...

timeforonesec += skew;  /* Compensate for skew */
if(currentcount - startcount > timeforonesec)  
    break;

Accuracy of the Measurement

Naturally, there are a lot of processes that compete for the processor. Thus, the benchmark measured flutters quite a bit. To compensate for that, the utility keeps a running average.

To test the accuracy of the code, and to prove if the compensatory trick works as intended, I was in a bit of a dilemma. The old Heisenberg principle kicked in, I could not observe the processor speed, as the observer process interfered with the findings. In other words, anything I tried to do with the processor interfered with the measurement.

Finally, in an idea to test the compensation code, I executed the benchmarking code snippet (from a macro) one hundred times in one test, and fifty times in the other test. If the compensating code works correctly, the results of the second test should be twice as large as the first test. It worked! Hurray!

I thought about the fact that measuring the processor speed involved a theory developed in connection with quantum mechanics. Very COOL. Isn't this what programming is all about?

Benchmark Items Described

The following is a short synopsis of the items on the benchmark screen:

Function Name Code Description Notes
do_int, count2++ Simple integer addition and/or increment Increment and add at the same speed
do_mul, count *= 33 Simple integer multiplication Almost as fast as the add
do_div count /= 13
Simple integer division
Slower than double add
do_mod count %= 13 Simple integer modulus Same speed as div
do_str memcpy(str, str2, 1024) Copy a 1k string Involves memory. Nice test to see if the DDR 400 upgrade worked.
do_str2 memcpy(str, str2, 1) Copy a 1 byte string Library call overhead
do_dbladd dop2 += dop1; dop1+=3 add two doubles the optimizer knew bout the second var not changing., so we padded it.
do_dbladd3 dop2 = dop1 + dop3; dop1+= 2 add two doubles, assign padded as well
do_dbldiv dop2 /= 103 double divide faster than integer op
do_dblsin dop2 = sin(dop1) trigonometric slower than expected
do_func noop(count2) call a blank function debug build a lot slower
do_cos cos(dop1) trigonometric all trig functions execute with similar speed
do_tan tan(dop1) trigonometric
do_sqrt sqrt(dop1) square root the processor works hard here

The Debug Build

The debug build has similar performance as the release build. The only significant difference is in the function calls. This is (possibly) due to the overhead of the debugger keeping tabs on the function call stack.

The Release Build

We had to disable optimization, as it breaks the code (naturally). The optimizer knows that we are making repeated calculations and repeated calls for no reason, so it optimizes it out.

Points of Interest

I have examined several processors with this utility (Core2, Sempron, AMD_64). To my surprise, I have found most of the adages learned in the past are broken. For example, when learning C or C++, one finds the recommendation that: 'variable++' is faster than 'variable += 1'. According to the benchmarks, var1 += var2 is the same speed as var1++;.

Another common knowledge is that working with doubles is slower than integers. Not according to the results of the benchmark. On my Athlon 64, the int divide is 45000/sec and the double divide is 153000/sec.

The integer multiply is just as fast as adding. Another unexpected result.

If all this sounds unbelievable, by all means, test it for yourself. Try the code, and if the code is in err, I would like to hear about it.

Feedback

Send me your processor's screen shot (peterglen@verizon.net).

History

  • 29th May, 2008: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

PeterGlen

Software Developer (Senior)
Self Employed
United States United States

Member

C, C++, DSP, Graphical Apps, UNIX, LINUX

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
RantIt's a start, but could use some work. PinmemberDavid I Hunt8:04 2 Jun '08  
Semprons were faster than P4's in benchmarks using floating point data sets smaller than its tiny L2 cache. Otherwise they sucked.
 
Your program is not a good assessment of processor performance. Many of the floating point operations are done via a function call, adding a huge amount of overhead to one tiny instruction. Aside from not actually measuring how fast a CPU computes cosine or a square root, this will skew the results towards processors that are better at doing the overhead.
 
Another, bigger, point is that its written in C++. C++'s performance is dependent on the compiler. By default, the 'double add' benchmark compiled to this on my copy of VS:
 
0041B5DC  fld         qword ptr [dop2 (46C148h)] 
0041B5E2  fadd        qword ptr [dop1 (46C140h)] 
0041B5E8  fstp        qword ptr [dop2 (46C148h)] 
0041B5EE  fld         qword ptr [dop1 (46C140h)] 
0041B5F4  fadd        qword ptr [__real@4008000000000000 (462978h)] 
0041B5FA  fstp        qword ptr [dop1 (46C140h)] 
 
The above code uses the x87 FPU, essentially a relic of times past. With a little tweaking of compiler settings, it generated this code which uses SSE instructions:
 
0041B5F7  movsd       xmm0,mmword ptr [dop2 (474148h)] 
0041B5FF  addsd       xmm0,mmword ptr [dop1 (474140h)] 
0041B607  movsd       mmword ptr [dop2 (474148h)],xmm0 
0041B60F  movsd       xmm0,mmword ptr [dop1 (474140h)] 
0041B617  addsd       xmm0,mmword ptr [__real@4008000000000000 (46A978h)] 
0041B61F  movsd       mmword ptr [dop1 (474140h)],xmm0 
 
SSE instructions execute faster than x87 instructions, at least on Intel processors from Pentium 4 onward. If the compiler was good at optimizing, it might generate code that packed both doubles into a single addpd instruction.
 
Oh, by the way, integer divide and integer modulus perform the same because they are the same. A good compiler or average assembly programmer can pull both results out of a single instruction.
 
I recommend reading these documents, specifically the Optimization Guides:
http://developer.intel.com/products/processor/manuals/index.htm[^]
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_15343,00.html[^]
 
I have nothing against VB or .NET; all programming languages are respectable. It just seems that some languages attract one echelon of programmers, and other languages attract another echelon of programmers. :P

GeneralRe: It's a start, but could use some work. PinmemberPeterGlen20:26 3 Jun '08  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120529.1 | Last Updated 29 May 2008
Article Copyright 2008 by PeterGlen
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid