Click here to Skip to main content
Click here to Skip to main content

Transform between IEEE, IBM or VAX floating point number formats and bytes expressions

By , 21 Nov 2005
 

Introduction

This program can transform a floating point number to its bytes expression or transform a bytes expression to a floating point number.

Background

Have you ever tried to develop a program to read a DLIS (Digital Log Interchange Standard) format data file? I found that the sample log data was recorded as VAX single float format. So, I had to read a 4 bytes stream from the binary file, and then recover the real number. I succeeded to recover all the frame data for all the channels. I compared my result with the output of the Schlumberger free tool program, Toolbox. They were identical. I also did some test using a free Java package, Cynosurex, and it gave the same result. This reminded me some of some halfway jobs I did about floating point number and bytes order analysis five years ago, and inspired my enthusiasm to proceed again.

Bits expression of floating point number

IEEE single precision floating point:

SEF :    S        EEEEEEEE        FFFFFFF        FFFFFFFF        FFFFFFFF
bits :   1        2      9        10                                    32
bytes :  byte1           byte2                   byte3           byte4

IEEE double precision floating point:

SEF:   S     EEEEEEE EEEE  FFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
bits:  1     2          12 13                                                       64
bytes: byte1         byte2      byte3    byte4    byte5    byte6    byte7    byte8
frctn.:                    L1                     L2

IBM single precision floating point:

SEF :       S        EEEEEEE        FFFFFFFF        FFFFFFFF        FFFFFFFF
bits :      1        2     8        9                                      32
bytes :     byte1                   byte2           byte3           byte4

IBM double precision floating point:

SEF:   S     EEEEEEE  FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
bits:  1     2     8  9                                                            64
bytes: byte1          byte2    byte3    byte4    byte5    byte6    byte7    byte8
frctn.:               L1                         L2

VAX single precision floating point:

SEF :       S        EEEEEEEE        FFFFFFF        FFFFFFFF        FFFFFFFF
bits :      1        2      9        10                                    32
bytes :     byte2           byte1                   byte4           byte3

General encoding formula of the floating point

V = (-1)<SUP>S</SUP> * M * A<SUP>( E - B )</SUP>
M = C + F

V is the value, S is the sign, M is called mantissa, A is base, E is exponent, B is exponent bias, C is mantissa constant, and F is fraction. A, B and C are constants that could be different with the floating point architecture. Here are some of them:

IEEE single float :     A = 2   B = 127   C = 1
IEEE double float :     A = 2   B = 1023  C = 1
IBM  single float :     A = 16  B = 64    C = 0
IBM  double float :     A = 16  B = 64    C = 0
VAX  single float :     A = 2   B = 128   C = 0.5

Maximum value of the fraction

As mentioned above, F is the fraction. The minimum value of the IEEE and VAX fraction F is 0, and IBM fraction minimum value is 1/16. F is zero means all fraction bits (F of the bits expression above ) are 0. The maximum value of the fraction will be reached when all fraction bits are 1. To figure out it, we have to use a little high school mathematics, can you remember this formula?

1/2 + 1/4 + 1/8 + ... + 1/2<SUP>n</SUP> = 1 - 1/2<SUP>n</SUP>

The only easy ignored detail here is about the VAX single precision floating point. Except its wired bytes order, its fraction bits segment starts from 1/4, not from 1/2 as IEEE or IBM. This is another example for that complexity is always from personality.

G is the maximum value of the fraction F

IEEE single float :     G = 1 - 1/2<SUP>23</SUP>
IEEE double float :     G = 1 - 1/2<SUP>52</SUP>
IBM  single float :     G = 1 - 1/2<SUP>24</SUP>
IBM  double float :     G = 1 - 1/2<SUP>56</SUP>
VAX  single float :     G = 1 - 1/2<SUP>24</SUP> - 1/2

Mantissa range

It is easy to figure out the mantissa range based on the above values of C and G. The IBM float mantissa minimum value will be explained below.

IEEE single float :     1 <= M <= 2 - 1/2<SUP>23</SUP>
IEEE double float :     1 <= M <= 2 - 1/2<SUP>52</SUP>
IBM  single float :     1/16 <= M <= 1 - 1/2<SUP>24</SUP>
IBM  double float :     1/16 <= M <= 1 - 1/2<SUP>56</SUP>
VAX  single float :     1/2 <= M <= 1 - 1/2<SUP>24</SUP>

Bytes order

I use a simple union data structure and a two bytes unsigned short integer 258 to find the kind of bytes order for your memory to store the number. For Little Endian architecture, such as: Intel, this function will return 2; for Big Endian architecture, such as: SPARC, this function will return 1.

Transform bytes to floating point

There are two steps for the transformation. The first step is to transform bytes to SEF, which means Sign, Exponent, and Fraction. The second step is to transform SEF to a floating point number.

1. Bytes to SEF:

Firstly, adjust the incoming bytes order to fit the above bits expression of floating point, then the SEF values can be gotten through some bits operation based on the above bits expression. For double precision floating point, I decompose the fraction into two parts: two unsigned long integers, which are L1 and L2.

2. SEF to floating point:

It is easy to recover the floating point number from SEF based on the above general encoding formula and the three constants A, B and C.

Transform floating point to bytes

Similar with the above method, there are two steps for the transformation. The first step is to transform the floating point to SEF. The second step is to transform SEF to bytes.

1. Floating point to SEF:

This part is the most important in all programs. I developed two methods to calculate the E and F from the floating point number.

The first method is more natural. Its principle is same as transforming an integer to its binary expression, which gets every bit through continually dividing the base 2. In our case, we can continually divide or multiply the base till the quotient settles within the mantissa range mentioned above. The choice of divide or multiply depends on whether the value E-B is positive or negative, but it is impossible to know the sign of the value E-B before E is known. Actually, we can determine it through comparing the floating point number with the mantissa bound value. The loop times is used to determine the E value, meantime, the surplus value of the original floating point after the loop is used to determine the F value.

The second method is a little complex. It uses the reverse algorithm to figure out the E value by the logarithm, then it is easy to get the F value by the above encoding formula. Actually, we can conclude the following formula for E and F:

V is the floating point number
D = log2 , base is e

IEEE single float :     E = (int) ( logV / D + B )
IEEE double float :     E = (int) ( logV / D + B )
IBM  single float :     E = (int) ( ( logV / D ) / 4 + 1 + B )
IBM  double float :     E = (int) ( ( logV / D ) / 4 + 1 + B )
VAX  single float :     E = (int) ( ( logV / D ) + 1 + B )

F = V / A<SUP>(E-B)</SUP> - C

I will give a brief proof for these formulae. The zero value and the sign S can be harmlessly ignored, so the float value is assumed as positive: V > 0.

  1. IEEE float:

    The mantissa range of the IEEE float is: 1 <= M < 2 . So: 0 <= logM / log2 < 1. Notice: E is a non-negative integer, so:

          (int) ( logV / log2 + B ) 
        = (int) ( logM / log2 + ( E-B ) + B ) 
        = (int) ( logM / log2 + E ) 
        = E
  2. IBM float:

    The key point here is mentioned in the RP66 reference document: Bits 1 - 4 of byte 2 may not all be zero except for true zero. In other words, the first hexadecimal digit of the mantissa must be non-zero, except for true zero. This means the IBM float mantissa minimum value is 1/16. So: 0 < logM / log16 + 1 < 1. Notice: E is a non-negative integer.

          (int) ( ( logV / log2 ) / 4 + 1 + B ) 
        = (int) ( logM / log16 + ( E-B ) + 1 + B ) 
        = (int) ( logM / log16 + 1 + E ) 
        = E
  3. VAX float:

    The VAX float mantissa minimum value is 1/2, so: 0 < logM / log2 + 1 < 1. Notice: E is a non-negative integer.

          (int) ( logV / log2 + 1 + B ) 
        = (int) ( logM / log2 + ( E-B ) + 1 + B ) 
        = (int) ( logM / log2 + 1 + E ) 
        = E

2. SEF to bytes

There are no difficult things for this part. It just needs bytes order adjustment and some bits operation.

The programs also include a regular Union method to transform an IEEE float and its bytes expression.

Compile

This program was compiled in the MinGW environment in Windows-XP. You have to set the PATH environment variable or run setp.bat before compiling.

set PATH = %PATH%; C:\MinGW\bin ;

Then run clib.bat to create the library, or manually compile the program as follows:

del libNumber.lib
del *.o
g++ -c ByteOrder.c
g++ -c Float2SEF.c
g++ -c SEF2Byte.c
g++ -c Byte2SEF.c
g++ -c SEF2Float.c
g++ -c IeeeFloat.c
g++ -c IbmFloat.c
g++ -c VaxFloat.c
g++ -c TestlibNumber.c
ar m libNumber.lib
ar r libNumber.lib *.o
ar t libNumber.lib
del *.o

Run cpsam.bat to compile the two test programs as follows:

cpsam test1
cpsam test2

After all, you can run the test1.exe program in a DOS window. You also can redirect the output to a text file as follows:

test1 > test.txt

test2.exe is another test program to test any float number for this library.

Reference

I have wrapped all reference web pages into my source code Zip package.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

John Jiyang Hou
Canada Canada
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionReturn Values and Float2SEF MethodmemberCuppM14 Jan '13 - 5:54 
Hi,
First thanks for this! It's very much appreciated to have this assembled.
 
Question 1
In looking at your code you have function that return non-zero values in certain cases, but you don't do anything with them in the calling functions. I'm trying to incorporate your conversion routines into a C# SAS transport format writer (need to convert IEEE doubles to IBM format) and want to make sure the writer is as robust as possible.
 
Here are a couple places where this occurs in the current version of the code (as of 2013-01-14 - it appears the latest version is your original from 2005-11-22):
  1. SEF2Float.c
    1. line 121 - should this be an exception? No output value is set.
    2. line 122 - should this return a NaN value?
  2. Float2SEF.c
    1. line 195 - should this be an exception? No output value is set.
    2. line 261 - should this be an exception? No output value is set.
 
Question 2
For the Float2SEF part, which of the 2 methods is better to use? Is one more robust or faster? Or are they equally good?
 
Question 3
Your proof of the IBM float formula mentions issues with the mantissa requiring a minimum value? Is this checked for and handled in code? Or is this something my calling code should be aware of?
 
Thanks again!
Matt
GeneralCalculating VAX fraction in binarymemberJennifer Poole10 Sep '10 - 1:48 
Hi, Firstly thanks for your article! Took a lot of careful reading but was very helpful!
 
However, I am having some issues.
 
I need to translate IEEE binary of a float into VAX binary....
 
I tried to use the float and calculate the binary using your formulae however, I get stuck when working out the fraction part.... Using the calculation, I end up with a decimal value so then this becomes 0 when I convert it to binary (obviously) which is of course wrong.
 
Can you provide more detail (with an example) of obtaining the binary in VAX of a float?
 
Below is what I worked out....
 
V = 123.45
E = (int) ((log(123.45)/log2) + 1 + 128) = 135
F = (123.45/2^7) -0.5 = 0.464453125
 
So the binary would end up being (lsb on the right)
0000000000000000 0 10000111 0000000
 
But when I then convert this back, I obviously don't get the right answer, I get 64.... I am soooo confused!!??
QuestionMr Hou :How this fuction works?memberjumbin1 Dec '09 - 16:36 
#define LittleEndian 2
#define BigEndian 1
......
if( MemoryByteOrder() == BigEndian )
{
b1=bytes[0];
b2=bytes[1];
b3=bytes[2];
b4=bytes[3];
}
else
{
b1=bytes[3];
b2=bytes[2];
b3=bytes[1];
b4=bytes[0];
}
......
int MemoryByteOrder()
{
union {
unsigned char cc[2];
unsigned short int si;
}aa;
 
aa.si=258;
 
return aa.cc[0];
}
 
In MemoryByteOrder it only defines a union ,but it can return a value aa.cc[0].why ?does it act as a function?
GeneralC# float VaxSingleFromBytes(byte[] bytes)memberKarl Tarbet10 Oct '08 - 3:40 
Thanks for the great article and excellent code. I combined a couple of your routines and converted to c# for reading some binary files created under OpenVMS (Alpha).
 
float VaxSingleFromBytes(byte[] bytes)
{
uint S;
uint E;
ulong F;
uint b1 = bytes[1];
uint b2 = bytes[0];
uint b3 = bytes[3];
uint b4 = bytes[2];
 
S = (b1 & 0x80) >> 7;
E = ((b1 & 0x7f) << 1) + ((b2 & 0x80) >> 7);
F = ((b2 & 0x7f) << 16) + (b3 << 8) + b4;
 
float rval = 0;
double M, F1, A, B, C, e24;
A = 2.0;
B = 128.0;
C = 0.5;
e24 = 16777216.0; // 2^24
 
M = (double)F / e24;
 
if (S == 0) F1 = 1.0;
else F1 = -1.0;
 
if (0 < E) rval = (float)(F1 * (C + M) * Math.Pow(A, E - B));
else if (E == 0 && S == 0)
rval = 0;
else if (E == 0 && S == 1)
throw new ArgumentOutOfRangeException();
//return -1; // reserved
 
return rval;
}
GeneralRe: C# float VaxSingleFromBytes(byte[] bytes)memberJohn Jiyang Hou10 Oct '08 - 16:52 
Nice conversion.
I didn't finish VAX double float conversion. Hope someone can try it.
GeneralGNU licensememberrreeeddbb1 Aug '08 - 1:08 
Hi,
I would like to use this program in an application I am writing.
 
Currently I have being referencing the .DLL of your code via <DLLImoport> (Project is in .Net). I haven’t made any changes to your code but I’m having trouble understand the GNU license. If I included your code this way would I be forced to provide the source for all the rest of my application?
 
Any clarification would be very helpful,
Thanks!
GeneralRe: GNU licensememberrreeeddbb1 Aug '08 - 8:57 
Never mind, saw this link.
 
http://www.codeproject.com/info/Licenses.aspx[^]
 
Oh well ;<
GeneralRe: GNU licensememberJohn Jiyang Hou1 Aug '08 - 15:24 
The code I wrote is not include any other Licensed codes.
You can use my codes for free.
It is encouraged to put this original reference link information when you use it.
QuestionTransforming IEEE floating point numbers, to IBM bytesmemberdcamacho8021 Jul '08 - 14:55 
Hello,
 
I'm trying to use your code (in Java) to convert IEEE floating point numbers to bytes in IBM format. I'm using another code to test if the transformation is applied correctly. This other code comes from a tested and trusted source (for security reasons, I can't provide information/code on this).
 
Your code works fine with positive numbers, but I can't get to match the tests when using negative numbers. Here are some examples:
 
----------------TEST 1----------------------------------
 
floating point value: -1.0 (single float)
 
With your method to transform singleFloat2SEF using IBM type, I get the following SEF:
Sign: 1
Exponent: 65
Fraction: 1048576
 
Using your method to transform bytes2SEF, I get the following bytes (Big-Endian):
b1 = 66
b2 = 16
b3 = 0
b4 = 0
 
Using the code from "Trusted-Company" to transform bytes to floating point numbers, I get:
floating point value: 16.0 (which should be -1.0)
 
----------------TEST 2----------------------------------
 
floating point value: -811.04 (single float)
 
With your method to transform singleFloat2SEF using IBM type, I get the following SEF:
Sign: 1
Exponent: 67
Fraction: 3322019
 
Using your method to transform bytes2SEF, I get the following bytes (Big-Endian):
b1 = 68
b2 = 50
b3 = -80
b4 = -93
 
for this example I have bytes to compare against:
expected b1 = -61 (incorrect: 68)
expected b2 = 50 (correct)
expected b3 = -80 (correct)
expected b4 = -97 (incorrect: -93)
 
Using the code from "Trusted-Company" to transform bytes to floating point numbers, I get:
floating point value: 12976.637 (which should be -811.04)
 
--------------------------------------------------
 

There seems to be a factor in all the tests I've performed: if I divide the input value by 16 before passing it as argument, I get the correct input number when testing, but with opposite sign.
 
Can you help me up with this?
 
Thanks in advance and Best Regards,
 
Daniel
 
PS: Congratulations for this code. It is the only source I've been able to find which provides valuable information regarding these floating point standards transformations.
AnswerRe: Transforming IEEE floating point numbers, to IBM bytesmemberJohn Jiyang Hou21 Jul '08 - 17:34 
Hi,
I write a small test program as followed:
 
//////////// IEEE to IBM /////////////////
#include "libNumber.h"
 
int main(int argc, char **argv)
{
if(argc!=2){
printf("Usage:%s IEEE754-Single-Float\n",argv[0]);
return -1;
}
 
unsigned char bytes[4];
float IeeeSingleFloat=atof(argv[1]);
IeeeSingleFloat2ByteL(IeeeSingleFloat, bytes);
 
float IbmSingleFloat;
Byte2IbmSingleFloat(bytes, &IbmSingleFloat);
printf("IBM Single Flaot: %f\n",IbmSingleFloat);
return 0;
}
////////////////////////////////////////
 
The test result(Windows/PC):
 
---TEST 1---
 
IEEE single float value: -1.0
 
Sign: 1
Exponent: 63
Fraction: 8388608
 
bytes (Big-Endian):
b1 = 191
b2 = 128
b3 = 0
b4 = 0
 
IBM single float value: -0.031250
 
---TEST 2---
 
IEEE single float value: -811.04
 
Sign: 1
Exponent: 68
Fraction: 4899471
 
bytes (Big-Endian):
b1 = 196
b2 = 74
b3 = 194
b4 = 143
 
IBM single float value: -19138.558594
 
=================================

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 22 Nov 2005
Article Copyright 2005 by John Jiyang Hou
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid