Click here to Skip to main content
Email Password   helpLost your password?

Introduction

Everyone has used good old time(NULL) to get timings accurate to the second. Some of you may have used GetSystemTime() to get sub-second timings. The really clever have found QueryPerformanceFrequency and QueryPerformanceCounter, which give timings accurate to a millisecond or better (for the record, I'm not among the really clever - I found out about those two by reading the Python documentation... Thanks, Guido!).

But if you really want accurate timings, this code will give you timings accurate to the machine cycle, which on a 1 ghz machine is one nanosecond. On a 2 ghz machine, it's 1/2 nanosecond. Old 100 mhz machines will only get 10-nanosecond timings. You get the idea.

It requires a tiny bit of assembler, but it's worth it:

__int64 GetMachineCycleCount()
{      
   __int64 cycles;
   _asm rdtsc; // won't work on 486 or below - only pentium or above

   _asm lea ebx,cycles;
   _asm mov [ebx],eax;
   _asm mov [ebx+4],edx;
   return cycles;
}

This code will work on Win9X or NT/2K and probably XP. Actually, it would even work in Linux if you can figure out how to get GCC to emit inline assembler!

Of course, the time comes out in cycles, not seconds, and oddly there seems to be no API to get the machine's speed. You can either "calibrate" the results (by getting the count, sleeping for (say) one second, getting the count again, and doing some trivial math), or just use cycles directly and don't worry about seconds (it's great for comparing one algorithm to another, or finding the slow parts of your program)

One warning, however: certain machines, notably laptops, can slow down their processor speed when nothing important seems to be happening. Since the calibration sleep is exactly one of those times, you can get some seriously wrong results from the calibration.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralToo bad you didn't research more on this
amir.tet
6:45 30 Aug '09  
QueryPerformenceCounter actually uses *rdtsc* internally,
and as you mentioned in your article, QueryPerformenceFrequency is used to find the proccesor frequency (so using both of them you can convert the outcome into normal time units)

one thing you probably don't know, and is quite important, is that *rdtsc* reads current proccesor cycle from the proccesor through a register.
this means that on a multi-proccesor environment (which is today almost every computer) you will get **different** resault based on the proccesor which executed rdtsc,
so without making sure of affinity somehow - you are almost surely to have wrong resaults.
QuestionInconsistancy with Vista.
Jose Praveen
18:02 22 Oct '08  
Confused
Dear all,

Any of you have tested this code under Vista.?It is working perfectly under all Windows OS (incl XP) with acceptable consistancy (less than 0.2% error) with resolutions upto 0.0001 seconds.

But when attepted on Vista, its not providing that sort of accuracy, even after disabling all eye-candy graphic options of Vista.

Any suggestions on this or support on a code for a consistant-high-resolution timer on Vista would is greatly appreciated.
AnswerRe: Inconsistancy with Vista.
joshua0137
19:18 6 Jan '09  
QueryPerformanceCounter
GeneralDon't have C++ installed
deletethisprofile
20:17 7 Aug '08  
Can someone post a compiled version of this please?
I need to use it from other .net languages.
OR
post the equivalent code in C# or vb.net

Thanks!

-toddmo

GeneralC#
screig
6:28 13 Jul '06  
Can you access the machine cycles through c#
GeneralPlease verify this ......
Anonymous
2:41 25 Mar '05  
Hi,

well ... tried to port it for gcc.
Due to missing understanding of inline assembler i would be appreciated if someone could verify it.

it seems to work:



unsigned int GetMachineCycleCount()
{
unsigned int cycles;
__asm__( "rdtsc\n\t"
"lea %0,%%ebx\n\t"
"mov %%eax,(%%ebx)\n\t"
"mov %%edx,0x4(%%ebx)"
:"=m"(cycles)
);
return cycles;
}


regards marbac
GeneralRe: Please verify this ......
Anonymous
4:36 25 Mar '05  
Mistake in the previous version!

Modified Version which takes about 30 cycles:



unsigned long int GetMachineCycleCount()
{
union {
unsigned int a[2];
unsigned long int b;
} tmp;

__asm__ (
"rdtsc\n\t"
"mov %%eax,%0\n\t"
"mov %%edx,%1\n\t"
:"=r"(tmp.a[0]), "=r"(tmp.a[1])
);
return tmp.b;
}

regards marbac
Generaltimer out of this?
nikoladsp
0:57 31 May '04  
WTFHi,
i like the article,and i am qiute wondering is it possible to make timer out of this program?timer should work on console-based apps WITHOUT message loop.
thanks and keep up good work.
nikola
GeneralWhat's the point
Anonymous
2:10 12 Dec '03  
What's the point? Windows is not a REAL time system. Since all threads are pre-empted and the time slice is 10 ms under windows NT,2000 and XP you can only measure time differences with an accuracy of about 10ms. Under windows 95 and 98 it's even much worse...
GeneralRe: What's the point
nastyimp13
11:16 16 Nov '05  
OK, I 2nd this question. Can someone reply with a reasonable answer. I'm a newbie (relatively), but do need to know the answer to this. Thanks! Poke tongue

~NastyImp
GeneralRe: What's the point
Robert Bielik
3:22 9 Jun '06  
Usually when using high resolution counter you want to time small pieces of code. In this respect, 10 ms can be a huge amount of time. There you have it. The point that is.
GeneralQueryPerformanceFrequency
parisitic
16:58 10 Dec '03  
On all the Pentium I through 4 systems I've used QueryPerformanceFrequency on, it returns the CPU frequency. I don't know if it always does...
AnswerRe: QueryPerformanceFrequency
admiralh2
8:57 16 Mar '07  
No, unfortunatley.

The Microsoft docs say this is a "hardware" timer, but not all manufacturers use the processor clock. I had two older machiones I worked on (about 4 years ago) have about 3 and 7 million as the counter (barely sub-microsecond timing). And I'm reasonably sure those were Pentium-level machines.

Generalclobbering ebx
LBMT
15:40 10 Dec '03  
Shouldn't ebx be pushed before using it (and popped afterwards)?
GeneralRe: clobbering ebx
dCp303
6:10 12 Dec '03  
I think the trick is that VC6 (i guess he uses this) inserts that code when the inline assember is used. The author's fault is that other compilers may not.
The two real unnessesary things (if i see this correctly) in the code is the lea instruction (doesn't make sense to me, because the first mov overwrites it anyway) and that the use of stack is redundant as mentioned in another thread by someone.

Regards. Sascha.

PS: Anyway, the article forced me two write down my own thoughts about RDTSC in general in a .cpp of mine.;)
GeneralRe: clobbering ebx
LBMT
6:42 12 Dec '03  
Yes, I realized later that not clobbering ebx is a compiler specific thing.

As for lea and mov, I don't see a problem with it. Both mov's are akin to dereferencing a pointer, so there's no overwriting going on there.

Considering that eax and edx are the registers typically used to return __int64 types, the cycles variable isn't really needed at all (As Zoltan points out).
GeneralHow about other Intel CPU's
Mad_C
21:26 9 Dec '03  
And how can I do this on StrongARM or XScale, for example?

GeneralRDTSC for Watcom C/C++
c2j2
21:25 9 Dec '03  
extern __int64 RDTSC(void);
#pragma aux RDTSC = 0x0f 0x31 value [edx eax];

This is the definition for the Watcom C/C++ compiler, in case anybody needs it. Then call it like

__int64 nStart = RDTSC();
<do something>
__int64 nDiff = RDTSC() - nStart;

Christian

GeneralWill HT processor work?
Vincent Leong77
20:04 9 Dec '03  
Good suggestions. I do image processing development. I believe this can help me get a better timing.

So will HyperThreading processors works? any constraints on using it?
Just asking to see if i have a very short routine. Will it able to time it?
A hypothesis is 1 ALU might still executing the begin timing and another had already execute the end timing.

Crystal Silver Codes
vleong@first.net.my
GeneralRe: Will HT processor work?
c2j2
21:31 9 Dec '03  
CPUTicker (mentioned somewhere else here) uses CPUID to serialize the processor. I guess that forces instruction linearization (out of order processing).

Now look up Intel documentation if that helps for HT too. No clue.

Christian

GeneralRe: Will HT processor work?
tmangan
2:58 10 Dec '03  
You'll have to check the Intel doc on that (posted elsewhere against this article) - but without looking I would guess that it would not work well. I am guessing that it is part of the logical processor core -- which means that you are counting cycles at that level. If that is the case, the HT design is such that logical processors steal cycles from it's mate if the mate doesn't need them. If it is part of the global core, then you are OK on that, but now you are counting the sum of cycles used by each logical processor which is another issue.

Certainly, for both an HT situation, or any Multi-CPU situation, you will have to use processor affinity set to a single CPU. This ensures that the first and second measurements are run on the same (logical) processor, so that you can do the math.

Also note that this method will only work with measuring small portions of code (at best). And you need to run such a test many times -- picking the lowest result as the time used by your code. This is because the OS is multitasking and may well bump your process/thread to do some work on another thread (or interrupt). If the code being measured is small, does nothing to cause a wait (including disk, socket, or screen I/O), the system isn't very busy, and you run it enough times you will find that magic moment when the system dedicated the CPU to your thread the entire time. Applying processor affinity to other tasks to not use this CPU will also help your cause (although you cannot do that to certain system tasks).

So, while maybe not the world's best timer, it is a different tool for the bag.

tim
Founder, TMurgent Technologies
www.tmurgent.com
tmangan@tmurgent.com
GeneralSomewhat disappointed
ReorX
10:51 4 Dec '03  
Actually - I'm a bit disappointed from the article. I understand CodeProject to be a place where I can learn something - for example, how rdtsc works.
The only thing I learn here, is that I should look somewhere else to figure that out.
I think, it would only take a few lines to explain the code.

BTW: For those who really want to count nanoseconds - PII PIII an PIV Processors can (under certain conditions) reorder instructions, so code like
| rdtsc
| something_else1
| something_else2
| rdtsc
can be executed as
| rdtsc
| something_else1
| rdtsc
| something_else2
So using rdtsc requires definitely further reading - and a warning about this would have been nice.

A word on other processor vendors would have been nice, too: AMD K6 oder Cyrix M1 do not support it.

Inline Assembler is coded in gcc with the
__asm__ __volatile__
semantics, so it could look like this:
#define read_rdtsc(ulli64) __asm__ __volatile__ ( "rdtsc" : "=A" (ulli64))


The Saviour of the World is a Penguin and Linus Torvalds is his Prophet.
GeneralRe: Somewhat disappointed
noshbar
0:45 9 Dec '03  
Ok, so if it only takes a few lines to explain, why not be the saving grace for allll CodeProject members and explain it? (I would, but I'm one of the stupid few who don't neeeed to understand why the transistor flops over before I'm prepared to use an instruction)

Secondly, how would you recommend doing nanosecond timing if this method is still so unpredictable and unstable?

AND, dude, seriously, calm down. If people didn't know about rdtsc before, they do now, so they could go look up on it.

AND I don't claim to understand how rdtsc works, but in _my_ tests, on Linux (which should excite you no end), on an AMD K6, it worked just fine. So I dunno how to explain that seeing as it's impossible, ahem, but -hey-, it works nicely.



Crayons don't kill people, death does.
GeneralCode to query CPU frequency
Zoltan Csizmadia
9:54 4 Dec '03  
Here is source code to get CPU frequency in MHz for Win32:

double GetCPUFrequencyMHz()
{
HKEY hKey = NULL;
DWORD status;
DWORD dwSize;
DWORD dwFrequency;
double dMHz = 0.0;

// Open registry for CPU information
status = RegOpenKeyEx(
HKEY_LOCAL_MACHINE,
_T("Hardware\\Description\\System\\CentralProcessor\\0"),
0,
KEY_QUERY_VALUE,
&hKey );
if ( status != ERROR_SUCCESS )
goto cleanup;

// Query ~MHz information from registry
dwSize = sizeof(dwFrequency);
status = RegQueryValueEx (
hKey,
_T("~MHz"),
NULL,
NULL,
(LPBYTE)&dwFrequency,
&dwSize );
if ( status != ERROR_SUCCESS )
goto cleanup;

dMHz = dwFrequency;

cleanup:

// Couldn't get CPU freq. from registry,
// let's try to calibrate it
if ( status != ERROR_SUCCESS )
{
__int64 start;
start = GetMachineCycleCount();
Sleep(1000);
dMHz = (GetMachineCycleCount() - start)/1000000.0;
}

if ( hKey != NULL )
RegCloseKey (hKey);

return dMHz;
}


Zolee

GeneralRe: Nicer format
WREY
10:56 4 Dec '03  
Your "Nicer format" could still be nicer. Get rid of the "goto" statements and re-structure the "if" statements.

Wink

William

Fortes in fide et opere!


Last Updated 3 Dec 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010