Unfortunately I found a bug in the division function. I have updated the source.
Arbitrary precision is a lot of fun. In my spare time, I create deep zoom fractal movies. I have developed my own class for calculations with many decimals, and it turned out to be many times faster than other similar classes that you can find on the Internet, referred from sites like Wikipedia and other developer forums. That is why I wanted to share my findings. This article only describes my class briefly. If you want further details, please download the code and examine it.
My class should be a starting point for further development, since it has some drawbacks that can be solved with some further hard work that might increase the speed further.
I have previously tried to use decNumber to render deep zoomed fractals, however since it is not possible to control the precision in run-time, rendering of the fractals takes about as much time when they are deep zoomed as when they are zoomed only a little. So I had to develop my own class for calculations with many decimals where I could increase the precision as the zoom gets deeper.
I only implemented the four standard arithmetic operations. Basically, I put several digits per integer in an integer array, and use the same methods that I did by hand in primary school, however treating each integer the same way as one treats each digit when doing it by hand.
Each 64-bit integer in the array can contain 8 digits, i.e., values between 0 and 100'000'000-1. Despite the low number of digits in each integer, there is no need to handle overflow when doing the actual multiplications, additions or subtraction. After the calculation loop, the array is looped through once again, and overflow is handled. This second loop is neglect-able compared to the nested loop of a multiplication.
In this way, a multiplication, which is to me the most interesting of the operations, is implemented as a
for-loop with an inner
for-loop. When I tested the performance, I realized that if the code inside the inner
for-loop is as little as possible, the calculation is performed significantly faster. Even a simple comparison is costly and must be avoided. So in the end, the inner loop of my implementation contains one single line of code!
The least significant value is stored first in the array. I have been thinking back and forth of if this was a good decision. However it is easy to expand values by appending integers in the end of the list, and it would be more time consuming to push new integers in the beginning of the list and offset the whole array.
There might be a drawback to use base 10, since I am using integer divisions and modulus to handle overflow - a base-2 class could use bit wise AND and bit wise shift operations instead, which are known to be much faster.
Now with 64-bit Windows,
__int64 is a native variable, and that speeds up the calculations significantly. I am able to easily create zoom movies that zoom more than hundreds decimals, i.e. 1e100. But I want to zoom much deeper, so I tried to find other arbitrary calculation modules on the internet, and I found 3 of them. The first one,
apfloat, is not fair to compare with my class. It is designed to be able to use millions or even billions of decimals, and therefore use heap memory or even file storage. The other two are
decNumber, now updated to support 64-bit, and the 64-bit library ttmath. I found out that my class is more than 3 times faster than
decNumber, and more than 7 times faster than
ttmath, when it comes to multiplying two 300-digit values. Also addition and subtraction is performed almost twice as fast as the others.
Division is not my big concern and I thought it was a pain to implement it. I needed to implement it to calculate the offset between pixels when rendering the fractals. However the dividing is just needed a couple of times when rendering a fractal - compared to the other arithmetic operations that are needed millions or sometimes billions of times.
The times below are in seconds for multiplying 10´000 times, adding and subtracting 100'000 times, and dividing 1´000 times with two 200 digit values, generated randomly. Similar differences were measured for other lengths of the values. Assignments of the variables were not included in the measurements.
The test was performed many times on a laptop with 64-bit Windows. Since Windows kind of have it's own life and doing all kind of background processing and swappings, the lowest value for each measurement was selected - after some runs one find a value not less exceeded.
|Library ||Multiplication ||Addition ||Subtraction ||Division |
|CKRFloat ||0.014 ||0.026 ||0.026 ||0.032 |
|decNumber ||0.040 (286%) ||0.048 (185%) ||0.048 (185%) ||0.026 (81%) |
|ttmath ||0.084 (600%) ||0.046 (177%) ||0.054 (208%) ||0.028 (88%) |
In addition, there is an optimization that can be done when multiplying a value with itself, i.e., squaring, which reduces the time by almost 20%. This is applicable when rendering fractals, so it is implemented in
Using the Code
CKRFloat can be used in an algebraic way:
CKRFloat a=23.3, b="18.12321", c=a*b;
- First revision
- Added the source
- Corrected some typos
- Changed type from Tips/trick to Article and on a request added info about the measurement.
- The division measurement was not correct; a fair comparison should give results with similar number of digits, so when a maximum precision was set in CKRFloat the value was updated.
- Bug fix in division (2013-04-19)