 |
|
 |
QueryPerformenceCounter actually uses *rdtsc* internally, and as you mentioned in your article, QueryPerformenceFrequency is used to find the proccesor frequency (so using both of them you can convert the outcome into normal time units)
one thing you probably don't know, and is quite important, is that *rdtsc* reads current proccesor cycle from the proccesor through a register. this means that on a multi-proccesor environment (which is today almost every computer) you will get **different** resault based on the proccesor which executed rdtsc, so without making sure of affinity somehow - you are almost surely to have wrong resaults.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
 Dear all,
Any of you have tested this code under Vista.?It is working perfectly under all Windows OS (incl XP) with acceptable consistancy (less than 0.2% error) with resolutions upto 0.0001 seconds.
But when attepted on Vista, its not providing that sort of accuracy, even after disabling all eye-candy graphic options of Vista.
Any suggestions on this or support on a code for a consistant-high-resolution timer on Vista would is greatly appreciated.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
 |
Can someone post a compiled version of this please? I need to use it from other .net languages. OR post the equivalent code in C# or vb.net
Thanks!
-toddmo
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
 | C#  screig | 6:28 13 Jul '06 |
|
|
 |
|
 |
Hi,
well ... tried to port it for gcc. Due to missing understanding of inline assembler i would be appreciated if someone could verify it.
it seems to work:
unsigned int GetMachineCycleCount() { unsigned int cycles; __asm__( "rdtsc\n\t" "lea %0,%%ebx\n\t" "mov %%eax,(%%ebx)\n\t" "mov %%edx,0x4(%%ebx)" :"=m"(cycles) ); return cycles; }
regards marbac
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Mistake in the previous version!
Modified Version which takes about 30 cycles:
unsigned long int GetMachineCycleCount() { union { unsigned int a[2]; unsigned long int b; } tmp;
__asm__ ( "rdtsc\n\t" "mov %%eax,%0\n\t" "mov %%edx,%1\n\t" :"=r"(tmp.a[0]), "=r"(tmp.a[1]) ); return tmp.b; }
regards marbac
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi, i like the article,and i am qiute wondering is it possible to make timer out of this program?timer should work on console-based apps WITHOUT message loop. thanks and keep up good work. nikola
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
What's the point? Windows is not a REAL time system. Since all threads are pre-empted and the time slice is 10 ms under windows NT,2000 and XP you can only measure time differences with an accuracy of about 10ms. Under windows 95 and 98 it's even much worse...
|
| Sign In·View Thread·PermaLink | 1.00/5 |
|
|
|
 |
|
 |
OK, I 2nd this question. Can someone reply with a reasonable answer. I'm a newbie (relatively), but do need to know the answer to this. Thanks!
~NastyImp
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Usually when using high resolution counter you want to time small pieces of code. In this respect, 10 ms can be a huge amount of time. There you have it. The point that is.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
On all the Pentium I through 4 systems I've used QueryPerformanceFrequency on, it returns the CPU frequency. I don't know if it always does...
|
| Sign In·View Thread·PermaLink | 2.33/5 |
|
|
|
 |
|
 |
No, unfortunatley.
The Microsoft docs say this is a "hardware" timer, but not all manufacturers use the processor clock. I had two older machiones I worked on (about 4 years ago) have about 3 and 7 million as the counter (barely sub-microsecond timing). And I'm reasonably sure those were Pentium-level machines.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
 |
I think the trick is that VC6 (i guess he uses this) inserts that code when the inline assember is used. The author's fault is that other compilers may not. The two real unnessesary things (if i see this correctly) in the code is the lea instruction (doesn't make sense to me, because the first mov overwrites it anyway) and that the use of stack is redundant as mentioned in another thread by someone.
Regards. Sascha.
PS: Anyway, the article forced me two write down my own thoughts about RDTSC in general in a .cpp of mine.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Yes, I realized later that not clobbering ebx is a compiler specific thing.
As for lea and mov, I don't see a problem with it. Both mov's are akin to dereferencing a pointer, so there's no overwriting going on there.
Considering that eax and edx are the registers typically used to return __int64 types, the cycles variable isn't really needed at all (As Zoltan points out).
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
 |
extern __int64 RDTSC(void); #pragma aux RDTSC = 0x0f 0x31 value [edx eax];
This is the definition for the Watcom C/C++ compiler, in case anybody needs it. Then call it like
__int64 nStart = RDTSC(); <do something> __int64 nDiff = RDTSC() - nStart;
Christian
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
Good suggestions. I do image processing development. I believe this can help me get a better timing.
So will HyperThreading processors works? any constraints on using it? Just asking to see if i have a very short routine. Will it able to time it? A hypothesis is 1 ALU might still executing the begin timing and another had already execute the end timing.
Crystal Silver Codes vleong@first.net.my
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
CPUTicker (mentioned somewhere else here) uses CPUID to serialize the processor. I guess that forces instruction linearization (out of order processing).
Now look up Intel documentation if that helps for HT too. No clue.
Christian
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
You'll have to check the Intel doc on that (posted elsewhere against this article) - but without looking I would guess that it would not work well. I am guessing that it is part of the logical processor core -- which means that you are counting cycles at that level. If that is the case, the HT design is such that logical processors steal cycles from it's mate if the mate doesn't need them. If it is part of the global core, then you are OK on that, but now you are counting the sum of cycles used by each logical processor which is another issue.
Certainly, for both an HT situation, or any Multi-CPU situation, you will have to use processor affinity set to a single CPU. This ensures that the first and second measurements are run on the same (logical) processor, so that you can do the math.
Also note that this method will only work with measuring small portions of code (at best). And you need to run such a test many times -- picking the lowest result as the time used by your code. This is because the OS is multitasking and may well bump your process/thread to do some work on another thread (or interrupt). If the code being measured is small, does nothing to cause a wait (including disk, socket, or screen I/O), the system isn't very busy, and you run it enough times you will find that magic moment when the system dedicated the CPU to your thread the entire time. Applying processor affinity to other tasks to not use this CPU will also help your cause (although you cannot do that to certain system tasks).
So, while maybe not the world's best timer, it is a different tool for the bag.
tim Founder, TMurgent Technologies www.tmurgent.com tmangan@tmurgent.com
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Actually - I'm a bit disappointed from the article. I understand CodeProject to be a place where I can learn something - for example, how rdtsc works. The only thing I learn here, is that I should look somewhere else to figure that out. I think, it would only take a few lines to explain the code.
BTW: For those who really want to count nanoseconds - PII PIII an PIV Processors can (under certain conditions) reorder instructions, so code like | rdtsc | something_else1 | something_else2 | rdtsc can be executed as | rdtsc | something_else1 | rdtsc | something_else2 So using rdtsc requires definitely further reading - and a warning about this would have been nice.
A word on other processor vendors would have been nice, too: AMD K6 oder Cyrix M1 do not support it.
Inline Assembler is coded in gcc with the __asm__ __volatile__ semantics, so it could look like this: #define read_rdtsc(ulli64) __asm__ __volatile__ ( "rdtsc" : "=A" (ulli64))
The Saviour of the World is a Penguin and Linus Torvalds is his Prophet.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Ok, so if it only takes a few lines to explain, why not be the saving grace for allll CodeProject members and explain it? (I would, but I'm one of the stupid few who don't neeeed to understand why the transistor flops over before I'm prepared to use an instruction)
Secondly, how would you recommend doing nanosecond timing if this method is still so unpredictable and unstable?
AND, dude, seriously, calm down. If people didn't know about rdtsc before, they do now, so they could go look up on it.
AND I don't claim to understand how rdtsc works, but in _my_ tests, on Linux (which should excite you no end), on an AMD K6, it worked just fine. So I dunno how to explain that seeing as it's impossible, ahem, but -hey-, it works nicely.
Crayons don't kill people, death does.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Here is source code to get CPU frequency in MHz for Win32:
double GetCPUFrequencyMHz() { HKEY hKey = NULL; DWORD status; DWORD dwSize; DWORD dwFrequency; double dMHz = 0.0;
// Open registry for CPU information status = RegOpenKeyEx( HKEY_LOCAL_MACHINE, _T("Hardware\\Description\\System\\CentralProcessor\\0"), 0, KEY_QUERY_VALUE, &hKey ); if ( status != ERROR_SUCCESS ) goto cleanup;
// Query ~MHz information from registry dwSize = sizeof(dwFrequency); status = RegQueryValueEx ( hKey, _T("~MHz"), NULL, NULL, (LPBYTE)&dwFrequency, &dwSize ); if ( status != ERROR_SUCCESS ) goto cleanup; dMHz = dwFrequency;
cleanup:
// Couldn't get CPU freq. from registry, // let's try to calibrate it if ( status != ERROR_SUCCESS ) { __int64 start; start = GetMachineCycleCount(); Sleep(1000); dMHz = (GetMachineCycleCount() - start)/1000000.0; }
if ( hKey != NULL ) RegCloseKey (hKey);
return dMHz; }
Zolee
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Your "Nicer format" could still be nicer. Get rid of the "goto" statements and re-structure the "if" statements.
William
Fortes in fide et opere!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |