Add your own alternative version
Stats
75.7K views 2K downloads 22 bookmarked
Posted
11 May 2009

Comments and Discussions



i think something is wrong with this claim. i did not check the source code yet, but if you can do division so fast, then you can do multiplication in o((n+m)log(n+m)/2), that is even much faster than the fft. just consider the next algorithm: to multiply a*b, where a has n digits and b has m, pick x=2^(n+m) and divide it to b. If the result is smaller then a, add x+=x/2, if is bigger subtract x/2, then repeat, divide by b, add/subtract x/4, and so on (like quicksort, every time set one bit right). at the end you get the right x=a*b and made (log(n+m)/2)*(m+n) digitoperations. something is wrong here. if you do paperdivision, then the order of that one is not m+n, but m*n.





Good argument, my original analysis was obviously naive, thanks for pointing it out. I'll put out a new version with correction shortly.





Writing a really fast BigInt implementation is a lot more complicated than it seems. For the past 18 years, the open source community has been developing a very fast library called GMP. You can find the .NET Wrapper for it at:
http://www.emilstefanov.net/Projects/GnuMpDotNet/





Excellent! However, I have a couple concerns: 1) this BigInt is implemented as a class rather than a struct (simply changing its definition to struct only results in two compilation errors concerning destructors and parameterless constructors) and 2) being "unsafe" introduces issues with .NET security policy. Also, it would be consistent with other .NET integral types to make conversion from float, double, and decimal explicit.
modified on Wednesday, September 16, 2009 7:40 PM





You should not change it to a struct because then you would have to do copying or reference counting of the underlying GMP resource. Unlike Int32's, BigInt's can be large (thousands of bits long) and you would not want to be making copies every time you pass them to a function or do an assignment.
I agree that unsafe operations can be a problem, especially if you want to use BigInt's on a platform like Silverlight. But if you're running it outside of the browser and you just want to get the fastest possible operations, unsafe code and interop is the way to go. If you run the speed comparison, you will see that interoping with GMP is about 10 times faster than a heavily optimized purely C# version.
As for type conversion, I just followed the .NET type conversions for Int32, which allow you to implicitly cast from Int32 to double, but require explicit casting for double to Int32.





This may sound a bit weird, but:
BigInt.Divide() for 18,446,744,071,562,067,968 / 65,535 gives the right quotient (281,479,271,710,720), but wrong remainder (3,276 instead of 32,768).





Yikes! I'll debug that tonight and get a new version out right away. Thanks for the catch!





I fixed the bug and the source and demo downloads have been updated, and it also drew my attention to some further enhancements I can and will be making to the division/remainder algorithm (cleaning up some unnecessary memory copying). I appreciate the second pair of eyes!






Thanks for the reference, I'll look into it





Just wanted to let you know that the upcoming C# 4.0 will have big int support built in. Thanks for the stopgap, though.





Yes, but I held my breath with the last version and they yanked it out at the last moment. I came to a point in trying to use c# for mathematical investigation and being let down by available big int implementations that I became determined to try my hand at it. And, of course, it's a very interesting subject and has been a rewarding exercise.





Sorry, but your implementation of BigInt is very inefficient, both in memory usage and performance:
 This implementation encodes digits using base 10 (storing each digit decimal (09) in a byte (0255)). This is a waist of memory, and makes all calculations MUCH slower.
 Memory is NOT minimized by storing digits in a LinkedList. On the other side, doing so requires creating 2 objects per digit. In your case, storing 1000000 would require creating 15 objects, each of which requires allocating memory, reference count, object information, etc. This is MUCH more than standard implementations, which only requires creating two objects (BigInt and byte[] which only requires 4 bytes).
Using base 2^32 instead of 10, and using an array instead of a LinkedInst would make the library at least 1000 times faster, and would require much less memory.





The goal of this implementation is not to optimize performance or memory usage (I backed off on the absoluteness of that claim and put it in relative context to Chew Keong TAN’s implementation). However, if you download the source / demo and try it out, I believe you’ll find it plenty fast and memory friendly for all but the most extreme applications (in which case you’d probably want to use an unmanaged framework in the first place).





Other BigNumber implementations that I've seen are very slow. You said "Memory consumption is minimized without sacrificing performance", so can you provide some details as to who you got past this problem? Comparison to other BigXXX libraries on CodeProject would be appreciated.
Thanks.





I apologize for not providing a fuller context in this first version of my article. C# BigInteger Class[^] is a very fast big integer implementation, but is bounded by a static constant length, wasting memory terribly. The point I was making was that a LinkedList is well suited for storing digits since memory is allocated ondemand, but only headtotail and tailtohead digit traversal is required for all of our algorithms so we are not compromising ourselves horribly by not using an Array. While not blazing fast, the standard addition, multiplication, subtraction, and addition are all reasonably fast for most applications, and I've been working on bringing other algorithms like Pow and Gcd / Lcd up to par too.





I mean a = d_{0} + d_{1}*1024 + d{2}*1024^{2} + ... 0 <= d_{i} < 1024
That way you will have much less operations total (because all big int arithmetics has complexity O(N) (or O(N^2)), where N is number of digits). Also 1024 allows to replace division and modulo by 1024 by bitshift and bitwise AND.
The only drawbacks are that ToString\ParseString routines will became a little more complicated and that the digits now require short or int type to be stored.
Also, I think using LinkedList<...> is not a good idea from both performance and memory consumption points of view:
1. It is much faster to iterate over cache friendly array, rather than linked list. Also greedy behavior of vector's memory allocator reduces allocations operations count from O(N) to O(log N) which is good.
2. Simplest linked list has memory overhead of ~4 bytes (x86) per one node list (to hold pointer (reference, whatever) to the next node). So LinkedList<byte> with 100 nodes consumes ~ 500 bytes.





It's even worse, the default LinkedListNode has 3 extra pointers, one for the list itself, one for the next node and one for the previous node.





Thanks for your valuable insight.
I have been planning on looking into switching to a larger base, but starting with base10 allowed me to get off the ground more quickly and also made debugging and algorithm visualization easier during initial design and development!
Addressing your other points:
1) I chose to store BigInt digits in a LinkedList rather than an Array for a number reasons, one being that it's more natural to use in many cases (e.g. we can't always reasonably figure out the exact number of digits we require upfront for the result of a computation like addition). Also, having run a few quick tests, on my computer (32bit Intel Core 2 Duo CPU @ 2.00 GHz x 2, 2.00 GB RAM) it takes about 15 ticks to allocate and assign an Array of length 1000000 and about 200 ticks to allocate and assign a LinkedList of length 1000000. The Array is clearly superior, but the LinkedList isn’t bad either! That being said, your point is well taken and I have contemplated gutting my implementation to use an Array.
2) It’s certainly undesirable but not a show stopper in my estimation. To put my claim to memory conservation in context, the only other publicly available c#/.net unbounded integer implementation I've found (C# BigInteger Class[^]) is specialized for cryptography and uses a static constant length Array, leading to definite memory exhaustion when pushed high for doing something like computing 20000! with a memorization based factorial implementation.





I'm not sure why you used LinkedList rather than List, though. List is based on an array and can be extended so that takes care of your argument about not knowing the length ahead of time. LinkedList allows you to make insertions/deletions into the middle of the list in constant time and that's it's primary advantage. On the other hand, finding the n'th element in a linked list is super slow compared to arrays. I haven't looked at your code but I'm guessing that you're referencing digits and inserting them at one end more than you're trying to stuff them into the middle of arrays.





I actually insert nodes from tail to head for addition and subtraction and from head to tail for multiplication and division, so a List wouldn’t take care of that particular problem. Also, adding elements onebyone to a List for a final Count of 1000000 ends up internally with an Array of Length 1048576. I believe the highest performing implementation would use an Array allocated with a least upper bound Length (should be quickly obtainable for most operations) and a tracked ActualLength (similar to a List, but we wouldn’t ever need Insert, Add, etc.). Certainly part of my reason for using a LinkedList was the novelty itself, and it has allowed me to approach problems in a more flexible way than I’d be fixed into with some kind of Array structure.





Also realize that each reference object has 8 bytes overhead. So for each linked list node you have at least 8 bytes overhead, 1 byte data, 4 bytes back reference, 4 bytes next reference, a total 17 bytes for each node. (Of course on a 64 bit system those reference would be 8 bytes instead of 4.) That leaves you for a list of 1,000,000 nodes needing 17,000,000 bytes. You'd be much better off using a List even if it allocates 1,048,576 bytes when adding 1 item at a time for 1 million item. Though you can optimize that somewhat by presetting the size.
But fun is fun, not everything in this world is about performance, or memory usage. Hey memory is cheap! And the if the algorithm is fast enough for your purpose why make it faster? Unless that's fun too...
Brian
Tooele Utah Real Estate





You're right, of course, on all points!
Thanks,
Stephen





I hope everyone enjoys this article and finds BigInt worthy and useful. Feedback is much appreciated and I'm already furiously at work implementing suggestions I've received from messages posted earlier but which have mysteriously disappeared. Thanks to Deeksha Shenoy for editing.





Stephen Swensen wrote: which have mysteriously disappeared
When the article gets approved  the previous comments are discarded as they were purely for discussion and advice before it became publicly available.
I haven't voted on this yet but I have bookmarked it as I'm interested to see how you develop this further.
Keep at it and thanks for sharing
DaveBTW, in software, hope and pray is not a viable strategy. (Luc Pattyn) Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia) Why are you using VB6? Do you hate yourself? (Christian Graus)







General News Suggestion Question Bug Answer Joke Praise Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

