##### UInt128 Series

## Introduction

Next up in the UInt128 series is multiplication. While I admit that the Division algorithm I'll be presenting next time is much more complicated, this is the one I worked the hardest to get right. In most algorithms I've seen, the Multiplication operator does most of the heavy lifting, and most of my major increases in performance came when I made improvements to it.

I originally had written a multiplication algorithm last year which held up pretty well and I was quite pleased with. However about a month ago, I stumbled across a similar function from Hacker's Delight (an excellent book which I highly recommend to all programmers) that multiplied two 32 bit integers to their full 64 bit precision, and stored the product in two more 32 bit integers, one holding the High bits, the other holding the Low bits. Sound familiar?

### Special Cases

The functions I present below are general multiplication algorithms which can be used on any number. They do not check for special cases, such as one of the numbers equaling 0, 1, or a power of 2. In my own code I use, I have these checks implemented. However my goal here is to show you the regular function, and adding these checks would only clutter the code and distract you from the important parts.

## Multiplication

I've divided the multiplication up into 3 parts. First up is the code I derived from the 32x32=64 function in Hacker's Delight.

#### 64x64=128

This function multiplies two 64 bit integers together to get a 128 bit product.

void mult64to128(uint64 u, uint64 v, uint64& h, uint64 &l)
{
uint64 u1 = (u & 0xffffffff);
uint64 v1 = (v & 0xffffffff);
uint64 t = (u1 * v1);
uint64 w3 = (t & 0xffffffff);
uint64 k = (t >> 32);
u >>= 32;
t = (u * v1) + k;
k = (t & 0xffffffff);
uint64 w1 = (t >> 32);
v >>= 32;
t = (u1 * v) + k;
k = (t >> 32);
h = (u * v) + w1 + k;
l = (t << 32) + w3;
}

#### 128x128=128

This one works just like when you multiply two 64 bits together and get a 64 bit product. The problem is that it can overflow, and it won't tell you, again, just like regular multiplication.

void mult128(uint128 N, uint128 M, uint128& Ans)
{
mult64to128(N.Lo, M.Lo, Ans.Hi, Ans.Lo);
Ans.Hi += (N.Hi * M.Lo) + (N.Lo * M.Hi);
}

As you can see, it uses the first multiplication function, which makes it a little easier on the eyes.

#### 128x128=256

And last but not least, multiplying two 128 bit numbers and getting the 256 bit product.

void mult128to256(uint128 N, uint128 M, uint128& H, uint128& L)
{
mult64to128(N.Hi, M.Hi, H.Hi, H.Lo);
mult64to128(N.Lo, M.Lo, L.Hi, L.Lo);
uint128 T;
mult64to128(N.Hi, M.Lo, T.Hi, T.Lo);
L.Hi += T.Lo;
if(L.Hi < T.Lo)
{
Increment(H);
}
H.Lo += T.Hi;
if(H.Lo < T.Hi)
{
++Ht.Hi;
}
mult64to128(N.Lo, M.Hi, T.Hi, T.Lo);
L.Hi += T.Lo;
if(L.Hi < T.Lo)
{
Increment(H);
}
H.Lo += T.Hi;
if(H.Lo < T.Hi)
{
++Ht.Hi;
}
}

This function calculates the parts one at a time. First it gets the upper 128 bits, then the lower 128 bits. Then it has to perform two more multiplications to get the middles bits, add them to the final product, and make sure any overflows are taken care of.

Speaking of overflow, you might have noticed that here, I used `if `

statements to handle the overflows, despite passing them over my Add/Sub article. I decided that, like the special cases, while they do improve performance, they do tend to clutter the code. I figured that doing it like this gets the idea across, looks nicer, and if you want to leave that way or change it, more power to you.

## Squaring

The last section I want to cover is squaring a number, i.e., S = R * R. Because we're multiplying a number by itself, there are a few optimizations that we can make. Like Multiplication, Squaring is divided up into 3 parts.

#### 64^2=128

This is based on the Mult 64x64=128 function. The only real optimization I made was using one less multiplication. I get the feeling I could make a few more improvements, but I was unable to get anymore working.

void sqr64to128(uint64 r, uint64& h, uint64 &l)
{
uint64 r1 = (r & 0xffffffff);
uint64 t = (r1 * r1);
uint64 w3 = (t & 0xffffffff);
uint64 k = (t >> 32);
r >>= 32;
uint64 m = (r * r1);
t = m + k;
uint64 w2 = (t & 0xffffffff);
uint64 w1 = (t >> 32);
t = m + w2;
k = (t >> 32);
h = (r * r) + w1 + k;
l = (t << 32) + w3;
}

#### 128^2=128

Again, based on the Multiplication version, I was able to replace a multiplication and addition with a left shift.

void sqr128(uint128 R, uint128& Ans)
{
sqr64to128(R.Lo, Ans.Hi, Ans.Lo);
Ans.Hi += (R.Hi * R.Lo) << 1;
}

#### 128^2=256

We're able to make an optimization similar to the last one, only it's a 64 to 128 bit multiplication, so it's a pretty good improvement.

void sqr128to256(uint128 R, uint128& H, uint128& L)
{
sqr64to128(R.Hi, H.Hi, H.Lo);
sqr64to128(R.Lo, L.Hi, L.Lo);
uint128 T;
mult64to128(R.Hi, R.Lo, T.Hi, T.Lo);
H.Hi += (T.Hi >> 63);
T.Hi = (T.Hi << 1) | (T.Lo >> 63);
T.Lo <<= 1;
L.Hi += T.Lo;
if(L.Hi < T.Lo)
{
Increment(H);
}
H.Lo += T.Hi;
if(H.Lo < T.Hi)
{
++H.Hi;
}
}

## Conclusion

Like I said at the start, I think Multiplication is the most important math operation, so fixing these up and optimizing them has been a labor of love for me. If you find any bugs or have any suggestions, please let me know.