Add your own alternative version
Stats
23.7K views 899 downloads 27 bookmarked
Posted
3 Nov 2013

Comments and Discussions



in: "LargeInteger::operator long double() const"
this:
for (unsigned int i = m_integer_length  1; (int)(i) >= 0; ++i) { ... }
never actually gets below the initial value (supposed to have been 'i').
Awesome class, I use it for combinatorial calculation scope predictions, works great otherwise. Much appreciated!





please just test like this:
LargeInteger x;
x.SetValue("0", 10);
std::cout << std::dec << x << endl;
the output is
you can solve this problem throw these:
find this function:
void LargeInteger::GetBase10NumberString(std::string & number_string) const
{
..........
}
then append some codes to the end like this :
void LargeInteger::GetBase10NumberString(std::string & number_string) const
{
..........
if (digits_string.length() == 1 && digits_string[0] == 0)
first_nonzero_position = 0;
number_string.erase();
for (unsigned int j = first_nonzero_position;
j < digits_string.length();
++j)
{
number_string += digits_string[j] + '0';
}
return;
}





please just test like this:
LargeInteger x;
x.SetValue("1234567890", 10);
std::cout << std::dec << x << endl;
the output is 5302428712241725440





Thank you. I will look into this. I notice the result is exactly off by 2 to the power of 32, so hopefully it is something simple.





At the bottom of the function ConvertDecimalStringToOctalDigits there is a block of code labeled with the comment "remove all leading zeros". Removing that code seems to get things working right.





As in the subject. Examples:
1. 2*2 = 4
2. 16/4 = 





Thank you for the bug report. I do see the sign being dropped, but I don't get the same result as you for the second case.
I got 4 for both answers.
Without the code you wrote for those two cases, there is no way to know why your second answer differs. It might be the code to input the number, the operation, or perhaps even something else.
Here's the code I used. I modified the supplied test program with this code.
xMultiplier0 = 2;
xMultiplier1 = 2;
xProduct = xMultiplier0 * xMultiplier1;
std::cout << "Product = " << std::hex << xProduct << std::endl;
LargeInteger xDividend = 16;
LargeInteger xDivisor = 4;
LargeInteger xQuotient = xDividend / xDivisor;
std::cout << "Quotient = " << std::hex << xQuotient << std::endl;
I'll fix this soon, although I don't know what "soon" means right now. I need more information about the case where you just had a negative sign output. I don't get that.
Thank you again.





The sign issues for multiplication and division have been fixed and the new code has been uploaded.
I also added two more cases to the test routine. More test cases should be added, but I do not have time to do that now. Hopefully no more bugs will be found.
I fixed another bug in the division routine.
Thank you very much for reporting the issues.
modified 6Sep14 21:22pm.





Thank you for this post.
Testing can be done automatically.
You take a formula X * X  Y * Y = (X + Y) * (X  Y)
Define a function that generate random X and random Y.
Calculate both sides.
Use a compare function, to check that they are equal.
Show a message when they are not.
Repeat thousends of times with different sizes of X and Y.
Repeat for distinct formulas.
You can adapt the formula to the test you want to perform:
only add/sub, only division, powers, modulo, .....
That is how I did it.
Greetings.





Thank you Frans, that was what I meant by using mathematical relationships in the thread below, although I did not test using that simple relationship. I like that and I will use that idea to test.
Greetings to you also!





I just took a quick look at the code. Nice and clean work.
A few things come to my mind:
1. C++ Exception handling. I think you class would be better off by leaving out C++ Exception handling The reason behind this is performence. It will also make the class lean. I'm not totally against C++ Exceptions here. However, I think you or the user can derive such an exception class easily. In fact, I believe you can just use C++ as "C with goodies" approach here to your core large integer class: only use class, ctor, dtor and most importantly overloaded operators.
2. 64 bit. You know that today 64 bit is almost a norm. If you just add a conditional x64 compliation support, then you can use int64 as your arrary. You will see much better performence immeadiately.
3. Intrinsics support. I was suprised to find 2 popcnt functions there. Plese check out popcnt32 and popcnt64 Intrinsics. They only takes less than 3 cpu cycles. The downside is that only a few platforms have them. But conditional compliation will just be fine.
I wrote a simple large int class a few days back. It only has 2 operators: + and ==
Yes, it is a cheap one so that I can add numbers up to 1024 bits easily and quickly.
modified 6Nov13 0:17am.





Thank you for the nice comments.
I did consider using a unsigned 64bit integer array, however, that limits the platforms the code can be used on, so the first implementation uses 32bit integers. I might add that at some point, but right now algorithmic issues are more serious. A multiply routine that does convolution by using forward and inverse FFTs, with a postprocessing step, will be much faster multiplication than even going to 64bit (although of course, doing the convolution with 64bit integers will be faster still), and right now multiplication is the bottleneck. Division is slow too, but division is not used nearly as often in the algorithms I am interested int.
Still, you are not wrong. A 64bit array would be faster. If I add that, it will be by using typedef for the main array integer type and some internal types, and, if necessary, some #ifdefs to compile for the two integer sizes. I do think it worthwhile adding that to the code at some point, and of course it will speed up all arithmetic and logical operations, not just the multiply routine.
For modern Intel x86 and x64 machines, I am also considering using SSE assembly instructions, which can do more than one multiplication in parallel. I've written SSE code before to speed up an image processing function. That two would be conditionally compiled based on the platform. However, all that is best done once the fastest algorithm is used. The algorithm is more important than the other optimizations.
As to exception handling, I do not agree. I might agree that the exceptions should be done differently, but they are necessary. If someone writes code such as:
x = y / z
And z equals zero, then it's necessary for an exception to be thrown, otherwise the error would occur silently. In this case, an exception would be thrown anyway by the C++ runtime code, only it would be an integer divide by zero. Since regular integer code can be mixed with large integer code, it's best if this is a different exception.
The same is true for failure to allocate memory. Note, some platforms can compile C++ so that when operator new fails to allocate memory, an exception is thrown anyway. In those cases, my memory checks are superfluous. Note, Visual Studio can do this with the right compile switches, but this is not universal for all compilers and all platforms.
I did not know about popcnt32 and popcnt64. That's good to know. The code could certainly have #ifdefs to use those for the platforms that support them. Thanks for the information.
I probably won't be updating this code for some time. I just started a new job, and my weekends are filled in the foreseeable future, but I do intend to update this code eventually. The next change will be the multiplication routine, but after that, I will look at these other changes. I agree, the intrinsic functions would be better, those are usually optimized for the platform.
Thanks again!
modified 18Dec13 20:24pm.





This type of code (numerical library) is fun to write but before actually using it for anything serious you need to test, test, test and that's where the problem is





Is this bug free? I don't know. You're right, something like this can't be tested enough! Even the parts of the code that I'm pretty confident about, such as the multiply routine, might have some hidden bug that only surfaces in some circumstance that I failed to test.
A little of the testing is in the program I provided. I did more testing than that though. I did manual calculations for some of the results I obtained. I used some mathematical relations to test some parts of the code. However, with all I did, the code is definitely undertested. Still, I have some confidence that code is working. I'm less secure about the subtraction routine than I am anything else, but I think that's working. I tested that the least.
I will update this if any bugs surface.





Even if I want something different than your main purpose and even if I think there's an error, I consider this to be good code. So, I am giving you a 4 for the code.






You say that "The theoretical longest integer length is 2 power of 31  1 dwords, where a dword is 4 bytes. This limitation is because, in some places in the code, the unsigned integer index into an array of 32bit dwords is cast to be a signed integer. This sets the maximum number to be 2 power of 33  4 bytes long, or approximately 8 billion bytes long!"
I used the power of because the formatting was wrong. But the problem is, where did the power of 33 appeared? Aren't you trying to say power of 31?
If you move from a power of 32 (unsigned  4 billion) to a power of something signed (31 and 2 billions) I can't see where there's a power of 33 and 8 billion long.
Yet, it is an impressive number.
But then I should ask you: Do you know the BigInt type that exists in .NET?
In fact, I would love to see an article on how a large/big int work. Seeing only its methods only tells me that I should stay using the .NET one and I want something to "learn how to do" instead of something to learn "how to use".





Paulo Zemek wrote: Seeing only its methods only tells me that I should stay using the .NET one
If you use .NET, of course you are going to use the one from .NET framework, but this is intended for native code. That said, there are other native C++ bigint implementations out there and it would be interesting to see how they compare to this class.





Message Removed
modified 4Nov13 18:55pm.





You answered a wrong message





The maximum index value is (2 to the 31st power  1), however, the index is not indexing bytes, it's indexing 4byte dwords. That is, (2 to the 2nd power) bytes per dword. So, the resulting product is "2 power of 33  4 bytes". (Note the switch from units of dwords to bytes)).
I hope that makes it clear.





OK. So it is 2 power of 31 (words, which gives 2 power of 33 bytes.
It is more clear now.
So, I can only ask for you to explain how to create such a class... it looks really interesting, in my opinion.





Thanks you for your comment.
I'm not sure I understand your question about how to create such a class. I took some of input algorithms from "Seminumerical Algorithms" by Knuth. The algorithm for converting base 10 to base 8 makes it easy to convert decimal integer strings to binary numbers. Other code, such as the multiplication code, I just wrote. I've been doing Digital Signal Processing my entire life, so I am very familiar with integer arithmetic on computers. Note, that code is simplistic, and only good for short integers, perhaps a few hundred digits. For really long integers, it's better to use certain fast convolution algorithms and then process the result to get the final product. I might add that to the code at some point.
I learned to overload operators after reading several C++ books.





In fact, only the explanation of your code (the Big Integer) will be good.





Paulo, each routine is very different.
Ignoring the input and output routines for numbers, the four main mathematical routines are operator *=, operator /=, operator +=, and operator =.
These allow the following types of calculations respectively.
x *= y;
x /= y;
x += y;
x = y;
All other operations, such as the operator + that takes two LargeIntegers for arguments, or one of the operator + methods that take a LargeInteger and an buildint type, such as 'int', or 'unsigned int' are implemented in terms of one of those four methods. Note, there are two methods for each builtin type, one with the builtin type the first argument, and one the builtin type the second argument. These each are converted to a LargeInteger using one of the LargeInteger constructors, and then call the operator + that takes two LargeIntegers for arguments. That then calls the operator += method.
Note, there is a special unary operator , that is just to make a number negative, for example:
x = y
The logical operations, such as operator , and others, are implemented the same way. For example, all operator  methods eventually call operator =.
There are shift operators, such as operator << and operator >>, again, based on operator <<= and operator >>= respectively. These allow expressions such as:
x <<= 5;
x = y << z;
And, of course, using >> to shift to the right.
Finally, there are six comparison operators, operator ==, operator !=, operator <, operator <= operator >, and operator >=. Although all of these could be implemented just calling operator <, they use both operator < and operator == for efficiency reasons.
To explain that last sentence, the main part of the body of operator == can be implemented in terms of operator < using the following code:
bool is_equal = !(x < y) && !(y < x);
However, I did not use this relation. I implemented operator == in the more straightforward way. With operator < and operator ==, all other comparison operators can be generated.
See the code for how the comparison operators are implemented. (By the way, I wonder if they couldn't be more efficient by rearranging these. While it's necessary to scan every dword to check for equality, for testing for inequality, one could exit as soon as any byte mismatches, so instead of implementing operator != in terms of operator =, perhaps I should have done that the opposite way. I am considering changing this, but right now, making the multiplication routine faster is a higher priority for me).
Of course, the comparison operators allow expression such as:
bool is_less_than = x < y;
if (x < y)
{
a = a + 1;
}
As far as the mathematics goes, that would take a really long article to explain. Check the Knuth book.
The multiply routine uses base 4294967296 numbers. 4294967296 is 2 to the 32 power. Each digit is a 32bit number from 0 to 4294967295 to form the base 4294967296 number. Two 32bit products are multiplied to form a 64bit product, and multiplication is done just like you would do it on paper, by multiplying each digit in the first number by every digit of the other number to produce a partial product, and then moving to the next digit in the first number and calculating the next partial product, and repeating this until all digits in the first number are done.
Note the two nested loops in the multiply routine, the outer one for the first number and the inner one for the second number. Finally, adding the resulting sums produces the answer. However, the sums are calculated for each partial product before calculating the next partial product. Other than that, it's like multiplying by hand, only using a very large base.
The divide routine less complicated. I believe I just used shift and subtract. I'm sure that could be done more efficiently too.
Check out the reference I gave for the input conversions between various bases. It's "Seminumerical Algorithms" by Knuth.
I hope that helps.
modified 18Dec13 20:23pm.







General News Suggestion Question Bug Answer Joke Praise Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

