Click here to Skip to main content
15,881,757 members

Response to: Performance was not as expected

Revision 4
Since the compiler does far less optimization in debug mode, you should really compare release version.

By the way, is n a signed or an unsigned number? At first, I think that n is signed and this is the reason for the extra instructions for the division (sign extension and substraction). If n would have been unsigned, then my guess is that the code would have been identical.

By the way, in both cases SAR (Shift Arithmetic Right) instruction is used. This confirm the fact that number n is signed. I think that extra instructiona are required for rounding purpose of odd negative numbers (by adding 1).

[ --- Improved anwser --- ]

If n can be negative, division might be faster on average simply because it will loop less often in some cases.


For example using division, we have 5 steps starting with -31 before reaching 0.
1: -31 / 2 = -15
2: -15 /2 = -7 
3: -7 /2 = -3
4: -3 / 2 = -1
5: -1 / 2 = 0 


If instead, we uses >>, we have 6 steps in that case:
1: -31 >> 1 = -16
2: -16 >> 1 = -8
3: -8 >> 1 = -4
4: -4 >> 1 = -2
5: -2 >> 1 = -1
6: -1 >> 1 = 0


For positive or even number, we would expect shift to be a bit faster when n is signed (and exactly the same if n would have been unsigned).

If this not the case, then it might be some obsure reason related to caching or jump prediction or pipelining.

By the way, if n was made unsigned, there might be a gain and if n is never negative, then the result would be the same. If n is always a power of 2, then n should really be unsigned.

Finally, if you really want to optimize that function and you expect n to be greater than 255 (for example), then you should split the number into parts.

Something like:
C++
unsigned int n = /* some power of 2 */
int exp = 0;
if (n > 0xFFFF) { exp = 16; n = n / 65536; }
if (n > 0xFF) { exp += 8; n = n / 256; }
if (n > 0xF) { exp += 4; n = n / 16; }
if (n > 0x3) { exp += 2; n = n / 4; }
if (n > 0x1) { exp += 1; n = n / 2; }
;


Except for very small n (say less than 32), that code would be faster.

If n is not a power of 2, then the code should give the maximum exponent such that 2^exp is less or equal to n.

And still faster:

If you compile in 32 bits under Visual C++, you can uses assembler BSR instruction.

Alternatively, you can uses intrinsic function (_BitScanReverse under Visual C++ 2005 and above).

For other compilers, you might have similar options.

Conclusion

In any cases, you should uses unsigned number if n is always positive as it will lead to more efficient code. Intrisic function is the best option if you really need performance.

If you want portable code that should be fast, then my unrolled code above would be the best option particulary if n is expected to be larger than 1 byte.
Posted 26-Jan-13 8:29am by Philippe Mori.
Tags: ,