Click here to Skip to main content
15,891,943 members
Please Sign up or sign in to vote.
3.80/5 (3 votes)
See more:
hi guys;
how do get 16 bit binary representation of unsigned float;
14bit matinssa and 2bit exponent.

I am aware of the IEEE 754 half-precision but it is 10bit to 5 bit 1bit sign
Posted

[EDIT]
After the many comments in all the solution's threads, I come to the conclusion that this is not a solution to the question.
I leave it nonetheless stand since it shows the approach to build your own class and convert forth and back for calculating and only *store* the value in that particular format.
[/EDIT]

You might consider to make a value type with all the bells and whistles like NaN and Inf handling but calculate all in doubles. E.g.
C#
// encoding: no sign, exp = 2 bits, significand = 14 bits
// exp = 0:    0.significand * 2^0
// exp = 1..3: 1.significand * 2^(exp-1) --> bias = 1
// special values: 0x0000u = zero
//                 0x0001u = NaN
//                 0x0002u = Inf
// there are no negative exponents --> max number = 0xFFFFu --> (2-2^-14)*2^2 = 8-2^-12 = 7.99975586
//                                 --> min number = 0x0003u --> (3*2^-14)*2^0 = 3*2^-14 = 0.00018311
struct F142
{
    private UInt16 _raw;

    private const UInt16 _zero = (UInt16)0x0000u;
    private const UInt16 _naN = _zero + 1;
    private const UInt16 _inf = _zero + 2;
    private const UInt16 _min = _zero + 3;
    private const UInt16 _max = (UInt16)0xFFFFu;

    private const UInt16 _significandMask = (UInt16)0x3FFFu;
    private const UInt16 _unit = _significandMask + 1;

    private const double _dMin = 3.0 / _unit;
    private const double _dMax = 8.0 - 8.0 / _unit;

    private UInt16 Exp { get { return (UInt16)((_raw >> 14) & 0x3); } }

    private void SetFromDouble(double d)
    {
        if (double.IsNaN(d)) _raw = _naN;
        else if (double.IsPositiveInfinity(d)) _raw = _inf;
        else if (double.IsNegativeInfinity(d)) _raw = _naN;
        else if (d < 0.0) _raw = _naN;
        else if (d > _dMax) _raw = _inf;
        else if (d < _dMin) _raw = _zero;
        else
        {
            _raw = _unit;
            while (d >= 2.0 && _raw < 3*_unit)
            {
                _raw += _unit;
                d /= 2.0;
            }
            if (d < 1.0) _raw = _zero;
            else d -= 1.0;
            _raw |= (UInt16)(d*_unit);
        }
    }
    private F142(UInt16 raw) { _raw = raw; }

    public F142(double d) { _raw = 0; SetFromDouble(d); }
    public static readonly F142 Min = new F142(_min);
    public static readonly F142 Max = new F142(_max);
    public bool IsNaN { get { return _raw == _naN; } }
    public bool IsInf { get { return _raw == _inf; } }
    public bool IsZero { get { return _raw == _zero; } }
    public static F142 FromDouble(double d) { return new F142(d); }
    public double ToDouble()
    {
        if (IsNaN) return double.NaN;
        if (IsInf) return double.PositiveInfinity;
        if (IsZero) return 0.0;
        double d = (_raw & _significandMask);
        d /= _unit;
        if (Exp > 0) d += 1.0;
        for (UInt16 i = 1; i < Exp; ++i)
        {
            d *= 2.0;
        }
        return d;
    }
}

Using like this:
C#
F142 res = new F142(3.0/2.0);
F142 nan1 = new F142(0.0 / 0.0);
F142 nan2 = new F142(-5.0);
F142 inf1 = new F142(17.0);
F142 inf2 = new F142(17.0/0.0);
Cheers
Andi
 
Share this answer
 
v2
Comments
Haileab Gebrezgiabher 27-Feb-15 4:20am    
Dear Andi thanks

you have lost me here; i c the filters, the NaN, Inf and zero, i will definitely take your advise there but apart for that could not follow you; may be i will try to run it and see. do i expect a binary value out or a decimal value ?
Andreas Gieriet 27-Feb-15 5:22am    
I interpret your question such that
- you work in C#
- have a requirement for a 16 bit floating point value
- 2 bit exponent, 14 bit significand, no sign-bit
I assumed you have similar encoding like a IEEE 754, but
- other number of bits per part (see above: 0,2,14)
- other bias (1)
My suggestion is to only store the numbers in that special format, but calculating to be done in double type. Otherwise, you would have to create a whole bunch of operators and operation (i.e. operator overload or functions for multiply, divide, add, sub, etc... plus a math library for trigonometric and other functions, etc.).
So, the simple solution:
- define a value type that stores these 16 bits
- define two conversions (from/to double) for this value type
- for convenience, add a constructor of this type taking a doublevalue
- handle all the Inf/NaN, etc. special situations
- handle all out-of-range situations --> NaN, Inf
I did not know your interpretation of the exponent and the significand and the special values. So, I did assume something and documented in the type definition.

No, I have no IEC62055-41 spec at hand. but maybe you can provide the relevant text snipped and tell more about the use case (e.g. only transport and print, or also calculate (and what is the expected error concept: out-of-range, not-a-number, etc.?).
Cheers
Andi
Haileab Gebrezgiabher 27-Feb-15 6:18am    
please don't get me wrong i am just looking for clarity; as to the doc i do understand i am great full you spending ur time to help out;

as to the doc don't mind sharing the portion for the discussion purpose(copyright issues) i dont see attachment or other tools i can use on this forum

in any case this is what is required

say i want a single binary string representation of 25.6 in 16bit 2bit exponent and 14 bit mantissa;
this is suppose to give me 0000 0001 0000 0000
and another example 1638.3 == 0011 1111 1111 1111


Andreas Gieriet 27-Feb-15 8:19am    
You need to read the specs carefully. This is a protocol definition that splits the whole "floating point" number into a s,e4,e3,e2 part plus a e1,e0,m13...m0 part. Combined, you have a floating point number of 20 bits: sign-bit, 5 exponent bits, 14 mantissa bits. The meaning of that number is defined too in that spec.
I.e. whatever you calculate in some program, you have to convert into that bit pattern to transfer it.
Cheers
Andi
[EDIT]
Modified such that it scales to units.
[/EDIT]

The following should work for a IEC62055-41 unsigned 2-bit-exponent 14-bit-mantissa fixed point value:
C#
public enum As
{
    Raw,      // raw number     - unit = 1.0
    Energy,   // kWh            - unit = 0.1
    Power,    // W              - unit = 1.0
    Water,    // m3             - unit = 0.1
    Gas,      // m3             - unit = 1.0
    Time,     // min            - unit = 1.0
    Currency, // local currency - unit = 0.00001
}

public struct FixU_2_14
{
    private UInt16 _raw;

    public UInt16 Bits { get { return _raw; } }

    public double this[As unit]
    {
        get
        {
            double mantissa = _raw & 0x3FFF;
            double n = 1.0;
            double m = 0.0;
            for (int i = (_raw >> 14) & 0x3; i > 0; --i)
            {
                n *= 10.0;
                m *= 10;
                m++;
            }
            double value = n * mantissa + m * 16384;
            switch (unit)
            {
                case As.Energy:
                case As.Water: return value / 10.0;
                case As.Currency: return value / 10000.0;
                default: return value;
            }
        }
    }
    public FixU_2_14(double value, As unit)
    {
        if (double.IsInfinity(value) || double.IsNaN(value))
        {
            throw new ArgumentOutOfRangeException("value", value.ToString());
        }
        switch (unit)
        {
            case As.Energy:
            case As.Water: value *= 10; break;
            case As.Currency: value *= 10000; break;
            default: break;
        }
        if (value < 0.0 || value > 1111.0 * 16384.0 - 1000.0)
        {
            throw new ArgumentOutOfRangeException("value", value.ToString());
        }
        _raw = 0;
        double n = 1.0;
        double m = 0.0;
        double max = 0.0;
        UInt16 exp = 0;
        while (true)
        {
            max = n * 16383 + m * 16384;
            if (max >= value) break;
            n *= 10;
            m *= 10;
            m++;
            exp++;
        }
        value -= m * 16384;
        value /= n;
        _raw = (UInt16)((exp << 14) | (int)value & ((1 << 14) - 1));
    }
}
Some usages:
C#
static void WriteFix(double d, As unit)
{
    string us = string.Format("{0}", "["+unit.ToString()+"]");
    FixU_2_14 fp = new FixU_2_14(d, unit);
    double dd = fp[unit];
    Console.WriteLine("{0,9} {4,-10} -> 0x{1:x4} = {2,9} {4,-10} (Delta = {3,3})", d, fp.Bits, dd, d - dd, us);
}
...
WriteFix(0.0, As.Raw);
WriteFix(1.0, As.Water);
WriteFix(2.0, As.Currency);
WriteFix(25.6, As.Energy);
WriteFix(1638.3, As.Energy);
WriteFix(16384, As.Water);
WriteFix(18201624, As.Gas);
for (int i = 1; i <= 18; i++) WriteFix(1000000 * i, As.Gas);
This results in
       0 [Raw]      -> 0x0000 =         0 [Raw]      (Delta =   0)
       1 [Water]    -> 0x000a =         1 [Water]    (Delta =   0)
       2 [Currency] -> 0x4169 =    1.9994 [Currency] (Delta = 0.000599999999999934)
    25.6 [Energy]   -> 0x0100 =      25.6 [Energy]   (Delta =   0)
  1638.3 [Energy]   -> 0x3fff =    1638.3 [Energy]   (Delta =   0)
   16384 [Water]    -> 0x7999 =   16383.4 [Water]    (Delta = 0.600000000000364)
18201624 [Gas]      -> 0xffff =  18201624 [Gas]      (Delta =   0)
 1000000 [Gas]      -> 0xa005 =    999924 [Gas]      (Delta =  76)
 2000000 [Gas]      -> 0xc0b5 =   1999624 [Gas]      (Delta = 376)
 3000000 [Gas]      -> 0xc49d =   2999624 [Gas]      (Delta = 376)
 4000000 [Gas]      -> 0xc885 =   3999624 [Gas]      (Delta = 376)
 5000000 [Gas]      -> 0xcc6d =   4999624 [Gas]      (Delta = 376)
 6000000 [Gas]      -> 0xd055 =   5999624 [Gas]      (Delta = 376)
 7000000 [Gas]      -> 0xd43d =   6999624 [Gas]      (Delta = 376)
 8000000 [Gas]      -> 0xd825 =   7999624 [Gas]      (Delta = 376)
 9000000 [Gas]      -> 0xdc0d =   8999624 [Gas]      (Delta = 376)
10000000 [Gas]      -> 0xdff5 =   9999624 [Gas]      (Delta = 376)
11000000 [Gas]      -> 0xe3dd =  10999624 [Gas]      (Delta = 376)
12000000 [Gas]      -> 0xe7c5 =  11999624 [Gas]      (Delta = 376)
13000000 [Gas]      -> 0xebad =  12999624 [Gas]      (Delta = 376)
14000000 [Gas]      -> 0xef95 =  13999624 [Gas]      (Delta = 376)
15000000 [Gas]      -> 0xf37d =  14999624 [Gas]      (Delta = 376)
16000000 [Gas]      -> 0xf765 =  15999624 [Gas]      (Delta = 376)
17000000 [Gas]      -> 0xfb4d =  16999624 [Gas]      (Delta = 376)
18000000 [Gas]      -> 0xff35 =  17999624 [Gas]      (Delta = 376)
Cheers
Andi
 
Share this answer
 
v6
Comments
Haileab Gebrezgiabher 28-Feb-15 10:31am    
did you just read my mind or what i was busy with 0x{1:x4} part and when i got back to the page you did the same thing !!! i think we have a solution.

i just want to integrator the fix values you used
Haileab Gebrezgiabher 28-Feb-15 10:41am    
this part was what was bitting me up

for (int i = (_raw >> 14) & 0x3; i > 0; --i)
{
n *= 10.0;
m *= 10;
m++;
}
return n * mantissa + m * 16384;

.....
Andreas Gieriet 28-Feb-15 10:55am    
I.e. it's solved for your, right? :-)
Cheers
Andi
Haileab Gebrezgiabher 28-Feb-15 11:12am    
almost;

0.1 should give 0x0001 === 0000000000000001
1 should give 0x0010 ==== 0100000000000001
25.6 should give 0x0100 0000000100000000

nevertheless, if it is taking your time i think i can take it from here practically it is solved, my main problem was to constitute the masks correctly with the max and min values & i can see how you did it
Andreas Gieriet 28-Feb-15 12:01pm    
Amended the solution to respect units.
Your numbers encoding depends on the unit: 25.6 is only allowed for units that represent a fraction of the base unit, e.g. 0.1kW or 0.1 m3 or 0.00001 currency.
For the others, the fraction is cut off, e.g. 1 m3, 1 minutes.
BTW: 25.6 kW become 0x0100, not 0x4010.
Cheers
Andi
Two bit exponent?
Are you sure?
That's not a lot of range 10^0 to 10^3 only.

AFAIK there is no "standard" implementation of a float using that resolution: you will have to implement it yourself.
 
Share this answer
 
Comments
Haileab Gebrezgiabher 26-Feb-15 7:28am    
thx men; that is very fast response

I know what you mean, but the task is to represent the exponent with 2 bit; bit 15 being the most significant of the exponent bit bit 13 is of the mantissa ...

I figured that much about the standards, was wondering if someone can point me to the right direction,
Andreas Gieriet 27-Feb-15 4:20am    
Why 10^0...10^3? Shouldn't it be 2^0...2^3 (or even 0...2^0...<2^4)?
Andi
Haileab Gebrezgiabher 27-Feb-15 5:00am    
this i think is the exponent ... the exponent as a number is expressed in base ten basically OriginalGriff is talking about the range of the exponent is from 0 to 3 inclusive
Andreas Gieriet 27-Feb-15 5:28am    
Yes, understood: two bits only allow for four values 0..3.
In floating point binary encoding, the exponent is as I know it always to the base of two, if the significand is also to the base two. Other encodings may be applied, but then the significand and the exponent must be to the same base - otherwise you had holes/overlap in your number ranges.
So: either all is to the base 2 or to some other base (e.g. 10).
Maybe you could provide a unambiguous and complete spec of what this floating point encoding is to be.
Cheers
Andi
Andreas Gieriet 27-Feb-15 17:34pm    
The format as given in that mentioned IEC62055-41 document is capable to store values from 0 to 18201624 with a precision of 4 decimal digits (with some odd gaps between the exponents). See my comment in solution 2.
For currencies, that "floating point" representation is extended to one sign bit, five exponent bits (0 ... 10^31) and still 14 bits mantissa (4 digits precision).
Cheers
Andi
You are right, IEEE 754 is 1 sign bit, 5 bits for exponent, other bits for mantissa:
http://en.wikipedia.org/wiki/Half-precision_floating-point_format[^],
http://en.wikipedia.org/wiki/Floating_point[^].

14-bit mantissa and 2-bit exponent could be your or someone else's fantasy and can hardly be efficient representation for most general numeric chores, but yes, I can imagine its use for some special applications. But what's the problem? Handling such would come at costs (first of all, you will have to check up that all input values are in valid ranges for this type, in all operations; bitwise operations are faster).

First, you create two bit masks and keep them 3 << 14 is the mask for exponent (0-3 values, shifted to most-significant end), its complement will be the mantissa mask. To create your number, check the ranges of mantissa and exponent, and then OR them together, it will be (exponent >> 14) | mantissa. To extract mantissa and exponent, AND the number with mantissa and exponent masks. Then, for mantissa, also shift it >> 14, the get mantissa value. Then use normal floating point arithmetic to get the resulting value, if you need it.

—SA
 
Share this answer
 
Comments
Haileab Gebrezgiabher 26-Feb-15 12:37pm    
Dear SA, thank you for your response, you are very right my reaction when saw the spec was not short of what you mentioned, however the spec was very specific on that regard; and you are right again it is a very specific application.

I saw you explanations it makes perfect sense; i see some light at the end of the tunnel thanks. please allow me to play with the idea you gave and get back to you all.
Sergey Alexandrovich Kryukov 26-Feb-15 12:42pm    
As I already said, I can imagine that this spec is good for something. For example one of HDR (high dynamic range) image formats has a custom floating-point pixel format.

Will you accept my answer formally now? If you need some more detail, please feel free to ask, I'll answer in all cases.

—SA
Haileab Gebrezgiabher 26-Feb-15 13:33pm    
sorry i didn't mark it as so i am actually trying it as we speak. any detail will always help, i can see you are proficient on the matter. will mark it answered now.

and thank you Sergey !
Sergey Alexandrovich Kryukov 26-Feb-15 15:21pm    
Nothing to sorry about; everything goes write. Thank you for accepting it, but, if you have further issues, feel free to ask your questions, possibly the follow-up questions in this space. We did not consider the interpretation of mantissa and exponent (signed or non-signed, which also translates in having fractional numbers, what is the signer representation if you need it; it all should be somehow mentioned in your specs and can be treated differently). I'll gladly help and foresee no big problems.
—SA
Haileab Gebrezgiabher 27-Feb-15 4:13am    
hi thx;
the sign is always positive any negative value is filtered out from the interface before it comes to the equation.


we are measuring the transfer of energy value. here is the mathematical equation:

where e is exponent;
m is mantissa;
t is transferred energy;



t = (10^e * m) for e =0;

t = (10^e * m) + sigma of{e, n=1} (2^14 * 10^(n-1)) for e>0;

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900