Introduction

Ever needed to scale 64 bit integers? I did when I was building a time tracking Kalman Filter. I decided to keep track of time as a 64 bit integer just as the .NET framework does. The big advantage of using the 64 bit integer is that you can keep time up to 100 nanosec. accurate and keep that accuracy after 2012. That is approximately the date after which you can no longer store seconds since 1980 in a double with microsecond accuracy. Another advantage is the easy exchange of times with .NET.

So I coded all the time keeping using __int64. This all went pretty well since the VS C++ compiler supports most arithmetic operations with __int64. You can add, subtract, multiply and divide, but all with the limitation that the result has to fit in a __int64. The problem arose when I wanted to scale a time value. The general idea of scaling is to express the scale factor as the quotient of a numerator and a denominator, then first multiply with the numerator and then divide by the denominator. You can’t use a separate multiplication and division for that since the intermediate result of the multiplication can be too large to fit a __int64. This problem used to exist in the 32 bit world when 64 bit integers were not yet supported by compilers. That’s why Windows provides a MulDiv function that does the multiplication and the division in one call, keeping track of the 64 bit intermediate result. The 32 bit processors for the PC support extended 32 bit divisions. Unfortunately the OS doesn’t provide us with a function to do the same thing with 64 bit integers. This is probably because the div instruction on today’s processors doesn’t support extended 64 divisions, yet.

The main problem lies with the 128 bit intermediate result that is needed to do a full 64 bit MulDiv(). After spending many hours on the web to find a pre-cooked solution I found a few references to the GNU MulDiv64 which scales a 64 bit integer with two 32 bit values and the book “The Art of Assembly Programming” by Randall Hyde. The 64*32/32 MulDiv would do the trick but didn't really satisfy my search for a full 64*64/64 MulDiv that would allow me to use only __int64. Thus I turned to Randall's book. This contained a section on Extended Precision integer arithmetic which was exactly what I needed. I implemented the methods described there in a small C++ library called MulDiv64. This way you don't have to go through all the nitty gritty of puzzling the pieces of the book together and the inline assembly programming.

The MulDiv64 library provides two functions:

__int64 _stdcall MulDiv64(__int64 operant, __int64 multiplier, __int64 divider)
__int64 _stdcall MulShr64(__int64 operant, __int64 multiplier, unsigned char rshift)

In situations where the divider can be kept constant at a power of 2, the division can be implemented using a right shift. This results in a much faster scaling because the full 128 bit division is somewhat slow due to the lack of hardware support.

Using the code

The library is implemented as a C++ static library. This way you avoid DLL or COM overhead in your calculations. Most functionality is written using inline assembly, except for the ABS() operations which the compiler can do efficiently itself. I tried to catch special conditions where processor support can be used to speed up the calculation. For example, when the divider is small enough to fit in a DWORD, the division can be done in four chunks instead of bit by bit. The result is always a 128 bit value but both functions clip the result to 64 bit because that is usually what you’re after.

Include the library MulDiv64.lib in the Additional Dependencies for your linker input. This can be found in the project properties dialog. Don't forget to include the header file as well.

An example for using the library:

#include "MulDiv64.h"

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 r = 0;
    __int64 a = 0xaaaaaaaaaaaaaaaa;
    __int64 b = 0x5555555555555555;
    __int64 c = 0x1000000000000000;    // = 1 shl 60
    char s = 60;
    
    // This will return an incorrect result
    r = a * b / c;
    
    // This will return the correct result
    r = MulDiv64(a, b, c);
    
    // Because dividing by c can be expressed as a right shift
    // we can obtain the correct scaling faster with:
    r = MulShr64(a, b, s);
    
    return 0;
}

Of course, the code could be imported in many other ways. For instance, it might be convenient to compile the library to a DLL or COM object. But usage this way will lead to significant performance loss.

You will see that the included test program is a little more verbose so that it can be run from the command line.