Will double numbers always round off to the same float value when casted?

Question

4.50/5 (2 votes)

See more:

Hi,

simple question first.

i have this code.

C++

double d = 0.123456789; 
float f;

f = (float)d;

assume the answer is 0.123456791f.

does it give the same float value (in this case 0.123456791f)
regardless of language, CPU(x86) and platform (Windows for example).

are there any references that speak of this as a fact?
(just need proof for documentation, that's why.)

also, is there a function or something that defines how the rounding happen?
for example, if this value comes, you put this value, so on and so forth.

now the big question.

i have this inequation.

C++

int myFunc(float f);

double L = someVal1;
double H = someVal1 + 1;
double X = myVal;
int answer;

//we have to pass myVal to a function that excepts only float
//if X >= L && X < H then we are fine. 
//otherwise, we need to do this.
if (X < L || X >= H)
{
    X = (L + H)/2;
}

//however, since in that function the double value 
//gets casted to float, we have to do it this way. 
//otherwise we might get a wrong value because of loss
//of precision 
//for example, if X == L, then we won't change X to mid value, 
//but if after float conversion it becomes less than L, 
//we will not get the value we anticipated. 
//so the code becomes, 

if ((float)X < L || (float)X >= H)
{
    X = (L + H)/2;
}

answer myFunc(X); //i will simply get a warning when compiling

so am i doing something wrong?
for example, can the value that results from (float)X
in that if condition be different to the value seen from
within myFunc();

not sure if this is applicable or true, so please let me know if not,
(float)X < L happens in the registers and variables inside myFunc()
are held in memory and there can be a difference?

thanks.

Posted 23-Apr-12 16:03pm

daCrazyDude

Add a Solution

4 solutions

Solution 2

How the conversion is performed is compiler and CPU specific. With x86 CPUs with integrated FPU, the FPU is used to perform the conversion. To check this, you can generate an assembler output file when compiling:

ASM

; 45   : 	double d = 1.23;
	fld	QWORD PTR __real@3ff3ae147ae147ae
	fstp	QWORD PTR _d$[ebp]

; 46   : 	float f = (float)d;
	fld	QWORD PTR _d$[ebp]
	fstp	DWORD PTR _f$[ebp]

fld loads a value and pushes it on the FPU stack. fstp pops the value from the stack and stores it in memory. The size of the memory pointer specifies the floating point type and the FPU performs the necessary conversion (the FPU uses internally a 80 bit extended format).

So the conversion from double to float will always give the same result for a specific double value here. It can be assumed that all x86 C/C++ compilers use these FPU commands to perform the conversion.

Regarding your function:

If the input value is of type float, you should do all calculations with float values if posssible (e.g. no polynomials of high order). This avoids problems with rounded values when comparing.

Posted 23-Apr-12 21:33pm

Jochen Arndt

Comments

Sergey Alexandrovich Kryukov 24-Apr-12 8:02am

Good points, my 5.
Even though I understand that the conversion is formally CPU specific, do you think different CPU can give slightly different results? Even though IEEE 754 defined rounding rules, its is not clear, what is the expected result of conversion between different accuracies. However, difference between different CPUs for such operation would be weird...
--SA

Jochen Arndt 24-Apr-12 8:50am

Thank you. I just want to show how the conversion is done.

I don't know if there are differences for x86 CPU's during evolution and second source implementations. But I don't think so for the rounding at conversions. The FLD command will just set the additional mantissa bits of the internal extended precision to 0. The FSTP command will round the values according to the rounding control bits in the FPU control register (rounding to nearest or even is the default; this must be especially watched when using conversions to integer values where truncation may be expected).

I don't know anything about other CPU/FPU hardware than x86. If other IEEE 754 FPU's did not use the internal extended format (strict single or double precision units), the results of operations will differ. To avoid this, the x86 FPU can be configured to always round values after each operation. But this is not necessary for this special conversion case where no operation is performed.

Finally, the data sheets of the different FPU's must be compared for IEEE 754 rounding compliance.

Sergey Alexandrovich Kryukov 24-Apr-12 11:30am

Thank you very much for these notes.
--SA

JackDingler 25-Apr-12 15:39pm

We do have this example...

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

Sergey Alexandrovich Kryukov 25-Apr-12 16:27pm

I know. I hope this is obsolete information, back in 1994. I remember when it happened...
--SA

Solution 3

Answer to the simple question first... The answer's simple - it's no - but here's some reasons:

- C++98 doesn't mandate either a format for floating point numbers and how they're to be converted between. Other languages may or may not specify their FP operations in the same way (Java and Python specify IEEE754) or do something completely different - you'll have to check the standard. C++11 might have said that FP has to be IEEE754 or similar but I can't check for sure as I have got a copy of the standard on me

- CPUs don't necessarily implement floating point the same way. In my experience most try and be IEEE754 compliant but there's no reason they should (aside from pissing off the compiler writers trying to be portable). Even if they all use the same standard I've seen quite a few processor FP bugs in my time - Intel have had to recall chips three times because of FP errors that have affected code I've written.

- Operating systems are usually pretty agnostic about FP, although (for example) there was code in NT 4.0 and Windows 2000 to cope with some Intel faux pas.

If you're really interested in using floating point in C++ have a look at the numeric_limits[^] class in a standard C++ compiler. And follow SA's advice of not mixing precisions.

As for the big question - Jochen's generally right that on a particular mix of compiler and processor you'd expect a conversion from double to float to come out with the same value wherever it's used. I'd be wary of assuming that if you use libraries compiled with 2 different compilers but providing you build all the source with the same compiler AND the same build settings you'll be all right. Probably.

Cheers,

Ash

Posted 24-Apr-12 0:11am

Aescleal

Comments

Sergey Alexandrovich Kryukov 24-Apr-12 8:09am

Interesting points I did not really know, my 5.

I would note that if C++ standard does not specify IEEE-754, it does not mean there are reasons to do it in a different way. C++ philosophy is close to following the CPU in a simplest possible way and avoiding "unnatural intellect", in striking contrast to FORTRAN which really tries to "fix" CPU results.

Do you think conversion from single to double precision can give different results on different CPU instruction-set architectures? It would be weird at least...
--SA

Solution 4

Calculations using floating point numbers is a tricky area. Searching for the advice you can easily dive deep in literature and documentation of academic level complexity. Thus, i would recommend to take a look at these two chapters from a very good C++ tutorial -
2.5 — Floating point numbers [^]
and
3.5 — Relational operators (comparisons)[^]
Shortly (from this chapter):
"Keep in mind that comparing floating point values using any of these operators is dangerous. This is because small rounding errors in the floating point operands may cause an unexpected result. See the section on floating point numbers for more details." Followed by advises on how to do comparisons using floating point numbers (including D.Knuth's suggested method).

Posted 24-Apr-12 3:23am

Sergey Chepurin

Updated 24-Apr-12 3:24am

v2

Comments

Sergey Alexandrovich Kryukov 24-Apr-12 11:17am

Good and useful reference, my 5.
--SA

Sergey Chepurin 24-Apr-12 11:33am

Thank you. But really, this is a surprisingly good C++ tutorial which is written by acting programmer during the period from 2007 to 2012 year. And as they say - "We are still adding content, so check back regularly!" The author combines both talents necessary for a good tutorial on programming language - tutor + programmer.

Sergey Alexandrovich Kryukov 25-Apr-12 16:34pm

I already bookmarked it and advised in one of my answers, and people were thankful. I like that its focus on C++ 11. In fact, I pretty much dislike C++ for being so archaic, and C++ 11 is much better...
Thank you very much.
--SA

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2012-04-23T17:44:00

First of all, the presentation of float and double numbers is independent from platform, language and instruction-set architectures, because it is governed by the standard IEEE 754. Please see:
http://en.wikipedia.org/wiki/IEEE_754[^].

In general case, cast never means true rounding, it's more of the truncation. Rounding is a pretty complex problem. Please see:
http://en.wikipedia.org/wiki/Rounding_error[^],
http://en.wikipedia.org/wiki/Truncation[^],
http://en.wikipedia.org/wiki/Rounding[^],
http://en.wikipedia.org/wiki/Precision_%28arithmetic%29[^].

Using approximate calculation is a big problem, more or less familiar to people working at numeric methods or numeric simulation, it's very hard to overview in a short answer.

As to your "big question", I cannot fully analyze it, just because your code won't compile, because of the last line. You may need to sort it out and explain the problem better, starting with your goal. I would only advise one general thing: don't allow for a mix of variables with different precision. If you have to work with double and float, make sure all intermediate operations are done in double; cast the result in float only at the final stage of calculation. By the way, you should understand that precision of float is insufficient in almost all cases. It is usually used for presentation of some final results, such as screen output, storage of final results, graphics. Also remember: the rounding error can accumulate.

—SA

Will double numbers always round off to the same float value when casted?

4 solutions

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

Existing Members

...or Join us