Click here to Skip to main content
15,886,689 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a loop in my Intel Vector assembly code. In the loop, the loop counter is used to read from and write to 4 consecutive memory locations. For example,

vmovdqu [r9 + rdx + 64], y0<br />
 vmovdqu [r9 + rdx + 96], y1


where is my loop counter. During profiling, I notice that using "r10d" instead of "rdx" register increases cycles. The initialisation to "r10d" takes 1 byte more than that to "rdx". What could be the reason for the cycle increase ?
Posted
Comments
Krunal Rohit 3-Mar-14 10:35am    
rdx is your loop counter ?

-KR
Member 9964804 3-Mar-14 10:55am    
Yes ... Type .. sorry !
Sergey Alexandrovich Kryukov 3-Mar-14 10:44am    
This is the internal business of the CPU. Its documentation contains all cycle counts for all cases.
—SA
Member 9964804 3-Mar-14 10:57am    
Nope .. it doesn't differentiate between GPRs.
Sergey Alexandrovich Kryukov 3-Mar-14 11:01am    
Hm...
—SA

Well, You may find that info in the documentation called Instruction Manual provided by manufacturer of the processor. Every processor manufacturer provides this documentation, but yeah it is not free. You have to pay for that. :)

-KR
 
Share this answer
 
v2
Hi KR , I have the instruction manual, but it does not give me any pointers.
The closest I came to a solution is here: http://stackoverflow.com/questions/17896714/why-would-introducing-useless-mov-instructions-speed-up-a-tight-loop-in-x86-64-a[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900