writing negative integer type to binary files!

Question

0.00/5 (No votes)

See more:

Hi guys,

I am trying to figure out how to write negative numbers to a binary file, following this algorithm (sorry I don't know the name of the format, Please let me know if there is any name for this format):

C#

while ((value & 0xFFFFFF80) != 0)
           {
               stream.WriteByte((byte)((value & 0x7F) | 0x80));
               value >>= 7;
           }
           stream.WriteByte((byte)value);

which value is an integer. For ex if value is 258, it will write two bytes [130] and [2] in the file!

Now my question is how can I extend this to negative number!
And also is there any name for this format?

Thanks.

Posted 27-Mar-12 7:09am

aidin Tajadod

Updated 27-Mar-12 7:48am

wizardzz

v2

Add a Solution

Comments

Sergey Alexandrovich Kryukov 27-Mar-12 13:14pm

Why not just writing the binary value as is?
--SA

aidin Tajadod 27-Mar-12 13:33pm

Thank SA,
It seems shorter! I can save alot for smaller value! Also there are some protocols which are using this format! (ex SNMP)

Sergey Alexandrovich Kryukov 27-Mar-12 23:01pm

That I understand. Thank you for answering, anyway.
--SA

aidin Tajadod 29-Mar-12 1:03am

Thank you SA for your time.

3 solutions

Solution 3

The problem you have is that byte is unsigned. To get the negative of a number you do the one's complement of a number, which would be xFFFFFF ^ positiveInteger for a short integer (note one's complement is not the only option, but the common one). The way a negative number is known is that the first bit of the number is a 1. Obviously you are sending this on a serial port. Probably all leasing bits have to be 1's for a negative number: Need to make sure that for negative values all leading bits are 1's when converting back. Need some idea of the algoithm that converts back.

C#

long value1 = convert.ToInt64(value);
for (int i; i < 5; i++) //5 iterations (5 * 7 = 35
{
  stream.WriteByte((byte)((value & 0x7F) | 0x80));
  value >>= 7;
}
stream.WriteByte((byte)value);

Posted 27-Mar-12 8:29am

Clifford Nelson

Comments

aidin Tajadod 27-Mar-12 15:14pm

good suggestion, thank you.
altough I cound not understand why you cast the value to long and how your code works!

Clifford Nelson 27-Mar-12 17:08pm

The reason I cast to long is that I need to write the leading 1's whenever I write. If you really wanted to limit the bytes written you would stop writing once you have written either -1 "1111111" or 0 "0000000". You are writing 7 bits instead of 8, so when you shifted, and you had a negative number, at the end you were writting a 7 bit pattern that started with 0 bits, making it appear positive.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Henning Dieterichs · Accepted Answer · 2012-03-27T08:17:00

OK, I'll get your code - interesting algorithm, but something I don't understand:
(a & b) | c = (a | c) & (b | c)
(value & 0x7F) | 0x80 = (value | 0x80) & 0xFF = value | 0x80 (considering only the last 8 bits)
(Distributive property)
So for what is the "& 0xFF"? I thought the "(byte)" truncates the leading bits anyway, isn't it?

In principle each byte starts with an 1 except the last which is always 0.
So the integer can be reassembled, otherwise the number of bytes to read cannot be determined).
If the value is small, only a few bytes are needed (best case 1 Byte), if the value is big, a lot more are used.
In the worst case (as you can only store 7 bits per byte, because the 8th bit is used for orientation) you will need 32 / 7 = 4.57 -> 5 bytes.

So if you know that it is more probable to get small values than bigger ones, this algorithm can be good. But I wouldn't use it for something else than micro-eletronics (where each byte counts) as we live in a world with unlimited storage...

You have IMHO nothing to consider when writing negative numbers, as the bits are only inverted (two's-complement) - it doesn't matter whether you use uint or int for "value".
The only problem I see is, that you need a lot more storage for small negative numbers as for small positive numbers, as -1 (from -127 to 127) is equivalent to 11111111.

OriginalGriff · Accepted Answer · 2012-03-27T08:24:00

Solution 2

If your value is 258, then it won't fit in a byte anyway - a byte is eight bits, and can hold numbers between 127 and -128, or 0 - 255 if it is unsigned.

If your value is a 32 bit integer as your code implies, then it will always require 4 bytes in your output file - otherwise you cannot store the full range of numbers that the variable value can hold.

Try writing it to the file as a array of bytes - it makes things a lot easier!

C#

int value = -12345678;
byte[] bytes = BitConverter.GetBytes(value);
int rebuilt = BitConverter.ToInt32(bytes, 0);

bytes will hold four bytes:

Decimal: 178 158 067 255 
Hex:      B2  9E  43  FF

Which is the binary equivalent of -12345678 (Little endian notation, which is normal for .NET and PC's generally)

Posted 27-Mar-12 8:24am

OriginalGriff

Comments

Henning Dieterichs 27-Mar-12 14:29pm

That is something else.

>> then it will always require 4 bytes
That is not exactly true as you can do some tricks, if you know that value is more often smaller than bigger you can actually save some bytes...
Read my solution.

aidin Tajadod 27-Mar-12 14:42pm

you are right, writing "as is" making it easier! but in my case, I need to write an integer that might be as small as 0 and as big as int.max! so this can save me lots of space when it is small! then I tought if I can find a way to write negative numbers too, then I can chage whole thing to use the above format instead of having two different format!

OriginalGriff 27-Mar-12 15:13pm

The trouble is that it doesn't save much space in reality.
It saves space (sorta - I'll come to that later) if that is the only data you are writing to the file - if it isn't, then you need to add data to say how many data bits (or bytes) you are storing. Otherwise, if you write [130][2][130][2] is that two integers, both 258, or one integer much bigger? Writing data is only of any use if you can clearly and consistently read it back! :laugh:
As for space saving, well, that is a bit of a myth. When you write a file containing eight characters, how much disk space does it take? 8 bytes? 8 bytes plus the file name? 8 bytes plus the file name plus a bit for the various dates and attributes? Nope, sorry - not even close!

Probably a minimum of 4K (plus space in various tables for the file names, attributes and so forth) since that is the smallest block size the disk storage system can cope with. So a file holding 1024 bytes takes the same disk space as one holding 10 bytes.

Trust me on this: most of the time, if you are saving space with clever ideas, you aren't really saving anything, and you are taking extra processing time to save and load data.

aidin Tajadod 27-Mar-12 16:55pm

In most of the case (the project I am working with), is saves the space specially because we are sending the generated bytes on the net. But I would agree with you if I just wanted to save them on the files.

Henning Dieterichs 27-Mar-12 15:59pm

It is saving space ;)
If you know the distribution, you can save space. Nothing else does zip or rar.
But if the distribution is uniformly (i.e. no entropy), in the sum you wont save space (so exe-files cannot be compressed really).
If the distribution is log(n) or something like that, it is more possible that smaller values occur than bigger.
So it will be more frequent that 1 byte (best case) will be written instead of 5 (worst case).
The clue is that the first bit says whether the data will be continued. If the first bit is zero, the value can be reconstructed.
So it is even possible to extend you charset (if you map chars to numbers),as you can represent any large numbers (you could append as much bytes as you want).
I think UTF-8 does something similiar.

If you want examples:
187: 00000000 00000000 0000000 10111011 -> 10000001 00111011 (saving 2 bytes)
597: 00000000 00000000 0000010 01010101 -> 10000010 01010101 (saving 2 bytes)
6: 00000000 00000000 0000000 00000110 -> 00000110 (saving 3 bytes)
- 1: 11111111 11111111 1111111 11111111 -> 11111111 11111111 11111111 11111111 01111111 (one byte lost)
And because of the leading continue-flag you can clearly read it back!
(1 at the beginning of each encoded byte means read it further, 0 means stop here)

aidin Tajadod 27-Mar-12 16:53pm

good point. Thanks.

writing negative integer type to binary files!

3 solutions

Solution 1

Solution 2

Solution 3

Add your solution here

Preview 0

Existing Members

...or Join us