 |
|
 |
Nice work, but I'd like to ask you about conversion operations with floating point numbers in C++.
My problem is below:
int main()
{
float a;
a=1.35f;
double b;
b=0.0;
b=b+a;
printf("\n%.15f\n",b);
getch();
}
in theory we'll have in result: 1.35,
but in practice we'll have something like 1.3500000238418579
Could you give me some advise?
I use Visual Studio 2008 Team System SP1
|
|
|
|
 |
|
 |
Because of the inaccuracies of storing numbers in floats, you should use SigFig() function above so it removes the 238... bit at the end. If your after better accuracy, use double's everywhere.
However, for speed I'd always use float, but be knowlegable in the fact that numbers can't be stored exactly.
Some numbers (e.g., 1/3 and 0.1) cannot be represented exactly in binary floating-point no matter what the precision. Software packages that perform rational arithmetic represent numbers as fractions with integral numerator and denominator, and can therefore represent any rational number exactly. Such packages generally need to use "bignum" arithmetic for the individual integers.
Regards,
Simon Hughes
|
|
|
|
 |
|
 |
BTW what was the criterion that make yoou to choose 1.0e-20 (beacuse of it is exponentially half-way on float capacity? )?
The standard float equality test == is difficult to grasp fotr the newbie, but introducing an arbitrary constant, IMHO may be misleading for him(I think the constant should be problem, i.e. application, dependent).
BTW Happy new year.
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
[my articles]
|
|
|
|
 |
|
 |
Will this simple Round function be faster? double Round(double val, int dp) { int modifier = 1; for (int i=0; i<dp; ++i) modifier *= 10; return (floor(val*modifier+0.5)/modifier); }
|
|
|
|
 |
|
 |
Congratulations. It's accurate (for positive numbers only) and it's 20 times faster in both debug and release modes.
However, you need to fix it to do negative numbers correctly.
RoundNew(2.15, 1) // gives 2.2 (correct)
RoundNew(2.149, 1) // gives 2.1 (correct)
RoundNew(-1.475, 2) // gives -1.47 (incorrect, should be -1.48)
However with a little fix:
double RoundNew(double val, int dp)
{
int modifier = 1;
for (int i = 0; i < dp; ++i)
modifier *= 10;
if(val < 0.0)
return (floor(val * modifier - 0.5) / modifier);
return (floor(val * modifier + 0.5) / modifier);
}
It all works now as it should.
20x faster, Excellent.
Regards,
Simon Hughes
|
|
|
|
 |
|
 |
Thanks for the fix. But I think need another small modification. Otherwise some negative numbers may not work. Just need to use ceil instead of floor when the numbers are negative. double RoundNew(double val, int dp) { int modifier = 1; for (int i = 0; i < dp; ++i) modifier *= 10; if(val < 0.0) return (ceil(val * modifier - 0.5) / modifier); return (floor(val * modifier + 0.5) / modifier); }
|
|
|
|
 |
|
 |
Yes your right. Thanks for the update.
Regards,
Simon Hughes
|
|
|
|
 |
|
 |
According to my benchmarking, this is still 3x slower than this:
double Round(double dIn, int iPlaces)
{
if (dIn < 0)
return -((long)(((-dIn*pow(10.0,iPlaces))+0.5))/pow(10.0,iPlaces));
else
return ((long)(((dIn*pow(10.0,iPlaces))+0.5))/pow(10.0,iPlaces));
}
|
|
|
|
 |
|
|
 |
|
 |
I just love this!!!!
Tnx!
93/93
|
|
|
|
 |
|
 |
i need a function which does like this:
quant(X,Q) takes two inputs, X - Matrix, vector or scalar. Q - Minimum value. and returns values in X rounded to nearest multiple of Q.
plz help me how to do. thanks in advance.
Ravi M.R
|
|
|
|
 |
|
 |
Hi,
I am trying to convert a double precision number to a float and I am having this problem, please help me to resolve the same.
double fdblValue = 11574.24;
float fFloatValue = ( float ) fdblValue;
I am getting 11574.2 instead of 11574.24. What is the issue? Please let me know.
Regards,
Sarma
|
|
|
|
 |
|
 |
I tried this...
-------------------------
double fdblValue = 11574.24;
float fFloatValue = (float) fdblValue;
cout<< fdblValue << endl<< fFloatValue;
-------------------------
Saw your problem.
.............
Then I tried this...
--------------------------------------
double fdblValue = 11574.24;
float fFloatValue = (float) fdblValue;
cout.precision (8);
cout<< fdblValue << endl<< fFloatValue;
---------------------------------------
It worked!
But its flaw is .. try setting "cout.precision (15)" instead of"cout.precision (8)" .. and see.
Always smile!
And if I am not for myself,
Who will be for me?
And if I am not for others, what am I?
And if not now, when?
|
|
|
|
 |
|
 |
Hello,
I am just beginning with Visual C++ building a simple dialog program to run some
calculations and I ran into a little bit of a wall. I will be inputting a couple floating point
values and then hit calculate and the program will perform some calculations and output
the data to another control box. My problem is:
1) I get theinput no problem
2) I convert it into a floating point
3) I do the calculation
//the problem is here!!
4) once the data is calculated I want to output it.
However the problem lies in converting the new calculated float back to a string
in borland you have FloatToStrF and it does it no problem
However here in visual C++ I have not found a routine or function that does this to my
requirements.
My question is how do I take the float value and return it to the edit control box as a
string. I might just not be doing it correctly to begin with thus the problem or
confusion but here is a simple break down.
I rewrote this a little using distance speed and time I figured this would be better then
giving you my program with all the variables (its for a robots arm movement)
//on hitting calculate inputs distance and speed
// outputs the time it will take
void CExoSpinDlg::OnCalc()
{
// these two just to store the value of edit control boxes
CString someText;
CString someText2;
//edit control box 1 is distance
m_distance.GetWindowText(someText);
// edit control box 2 is speed, i am using a bunch of spin controls for my data
// as well so i will put one in here but it should not make a differance right?
m_SPINVALUE.GetWindowText(someText2);
// now we have the two values stored as CStrings
//from string to float (i am aware of the possible loss of exactness here
// but until i figure out Vc++ a little more i am stuck
float distance = atof(LPCSTR(someText));
float speed = atof(LPCSTR(someText2));
float time = distance/speed;
//MY PROBLEM IS HERE TAKING THIS FLOAT AND PUTTING BACK INTO
// THE DIALOG BOXES
//back to string is where i am struggling with
// i know the ftoa is NOT correct but I can not seem to find any other
// way of doing it
char buffer[256];
ftoa(time,buffer,10);
MessageBox(buffer);
}
any thoughts or suggestions on how i might do this would be
appreciated
Thank You
Ed Storey
|
|
|
|
 |
|
 |
Well, there's always good ol' sprintf(buffer,"%.2f", time);...
- Carlos
|
|
|
|
 |
|
 |
Thanks Carlos! It's the little things that matter.
|
|
|
|
 |
|
 |
Suppose you have some code that calculates a bunch of rational functions: a rational function is function constructed using +, -, * and /. Then you can replace the code with something that uses only one /. It's only useful in certain situations depending on the relative speed of / and other operators. For example the following triangle rasteriser setup type code 'na = 1.0/a; nb = 1.0/b; nc = 1.0/c' can be replaced by 't = 1/(a*b*c); ct = c*t; na = b*ct; nb = a*ct; nc = a*b*t;". 7 multiplies and one divide should be faster than 3 divides. Damn - I've given away my secret.
--
SIGFPE
|
|
|
|
 |
|
 |
The significant figures specified in the original SIgFig routine get 'lost' when the value is converted to floating point. For example, if the result is 1.2 and we have specified 4 significant figures, we need to go to extra work to correctly display this as 1.200. The FloatToText routine converts it to 1.2; %f in the format will display something like 1.2000000. Yes, we can specify the precision modifier in the printf format code, but this requires that we compute the order of magnitude of the number so that we can determine how many decimal places we need to achieve the right number of significant figures. The following modified version of SigFig correctly produces the string version of the number. It is not pretty code, but it works!
CString SigFigStr(float X, int SigFigs)
{
CString str;
if(SigFigs < 1)
{
ASSERT(FALSE);
return str;
}
int Sign;
if(X < 0.0f)
Sign = -1;
else
Sign = 1;
X = fabsf(X);
float Powers = powf(10.0f, floorf(log10f(X)) + 1.0f);
float val = Sign * Round(X / Powers, SigFigs) * Powers;
str.Format("%f", val);
str.TrimLeft();
str.TrimRight();
int end = SigFigs;
if(Sign < 0)
end++;
if(str.Find('.') != -1)
end++;
str = str.Left(end);
// Remove decimal point if nothing after it. "1234." becomes "1234"
if(str.Right(1) == ".")
str = str.Left(str.GetLength() - 1);
return str;
}
|
|
|
|
 |
|
 |
// I may post all this code as an update to the main topic.
// This is 3.4 times faster than using sqrtf(...)
#define FP_BITS(fp) (*(DWORD *)&(fp))
#define FP_ABS_BITS(fp) (FP_BITS(fp)&0x7FFFFFFF)
#define FP_SIGN_BIT(fp) (FP_BITS(fp)&0x80000000)
#define FP_ONE_BITS 0x3F800000
static unsigned int fast_sqrt_table[0x10000]; // declare table of square roots
typedef union FastSqrtUnion
{
float f;
unsigned int i;
} FastSqrtUnion;
void build_sqrt_table()
{
unsigned int i;
FastSqrtUnion s;
for (i = 0; i <= 0x7FFF; i++)
{
// Build a float with the bit pattern i as mantissa
// and an exponent of 0, stored as 127
s.i = (i << 8) | (0x7F << 23);
s.f = sqrtf(s.f);
// Take the square root then strip the first 7 bits of
// the mantissa into the table
fast_sqrt_table[i + 0x8000] = (s.i & 0x7FFFFF);
// Repeat the process, this time with an exponent of 1,
// stored as 128
s.i = (i << 8) | (0x80 << 23);
s.f = sqrtf(s.f);
fast_sqrt_table[i] = (s.i & 0x7FFFFF);
}
}
inline float fastsqrt(float n)
{
if(FP_BITS(n) == 0)
return 0.0f; // check for square root of 0
FP_BITS(n) = fast_sqrt_table[(FP_BITS(n) >> 8) & 0xFFFF] | ((((FP_BITS(n) - FP_ONE_BITS) >> 1) + FP_ONE_BITS) & 0x7F800000);
return n;
}
void main(void)
{
build_sqrt_table();
float a = fastsqrt(1.234f);
}
|
|
|
|
 |
|
 |
Another Square Root Algorithm:
/*******************************************************
** square_root - single precision square root
********************************************************
** input: value to take the square root of
** output: nothing
** calls: frexp(), ldexp()
** returns: 0.0 if input value <= 0.0,
** otherwise square root of input value
********************************************************
*/
float square_root(float xx)
{
float f, x, y;
int e;
f = xx;
if (f <= 0.0)
{
return 0.0;
}
/* split mantissa and exponent */
x = frexp(f, &e); /* f = x * 2**e, 0.5 <= x < 1.0 */
/* Q - is power of 2 odd ? */
if (e & 1)
{
/* yes - double mantissa and decrement the power of 2 (exponent) */
x = x + x;
e -= 1;
}
/* compute exponent power of 2 of the square root */
e >>= 1;
/* Q - is the mantissa between sqrt(2) and 2 ? */
if (x > 1.41421356237)
{
/* yes - offset mantissa, compute series */
x = x - 2.0;
y =
((((( -9.8843065718E-4 * x
+ 7.9479950957E-4) * x
- 3.5890535377E-3) * x
+ 1.1028809744E-2) * x
- 4.4195203560E-2) * x
+ 3.5355338194E-1) * x
+ 1.41421356237E0;
}
/* no - Q - is the mantissa between sqrt(2)/2 and sqrt(2) ? */
else if (x > 0.707106781187)
{
/* yes - offset mantissa, compute series */
x = x - 1.0;
y =
((((( 1.35199291026E-2 * x
- 2.26657767832E-2) * x
+ 2.78720776889E-2) * x
- 3.89582788321E-2) * x
+ 6.24811144548E-2) * x
- 1.25001503933E-1) * x * x
+ 0.5 * x
+ 1.0;
}
else
{
/* no - mantissa is between 0.5 and sqrt(2)/2 */
x = x - 0.5;
y =
((((( -3.9495006054E-1 * x
+ 5.1743034569E-1) * x
- 4.3214437330E-1) * x
+ 3.5310730460E-1) * x
- 3.5354581892E-1) * x
+ 7.0710676017E-1) * x
+ 7.07106781187E-1;
}
/* calculate y = y * 2**e */
y = ldexp(y, e);
return y;
}
|
|
|
|
 |
|
 |
Your square_root() function is accurate, but is slower than sqrtf() iteself (about 3.5 times slower) :-
|
|
|
|
 |
|
 |
Thanks for your response.
I'm using the square_root() function in an embedded x86 system written with the MSVC v1.52c compiler. The runtime didn't have a single precision sqrt() so this algorithm was faster for me than the runtime double precision version.
I like the table driven approach, and will investigate placing the table into ROM.
Steven J. Ackerman, Consultant
ACS, Sarasota, FL
http://www.acscontrol.com
sja@gte.net
|
|
|
|
 |
|
 |
How accurate is this? About how-many bits off is it from the real answer?
I also wonder in real applications how much performance penalty one pays for having to load a 32k integer table (128k bytes?) into cache before this function can be called. Perhaps in very tight loops it's worth it.
Thanks for the bit-hacks. I'm always interested.
|
|
|
|
 |
|
 |
// This is about 2.12 times faster than using 1.0f / n
// r = 1/p
#define FP_INV(r,p) \
{ \
int _i = 2 * 0x3F800000 - *(int *)&(p); \
r = *(float *)&_i; \
r = r * (2.0f - (p) * r); \
}
|
|
|
|
 |
|
 |
Simon Hughes wrote:
int _i = 2 * 0x3F800000 - *(int *)&(p); \
uhhh ?? what kind of magic is this ?
|
|
|
|
 |