Click here to Skip to main content
15,887,027 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi guys,

I am trying to figure out how to write negative numbers to a binary file, following this algorithm (sorry I don't know the name of the format, Please let me know if there is any name for this format):

C#
while ((value & 0xFFFFFF80) != 0)
           {
               stream.WriteByte((byte)((value & 0x7F) | 0x80));
               value >>= 7;
           }
           stream.WriteByte((byte)value);

which value is an integer. For ex if value is 258, it will write two bytes [130] and [2] in the file!

Now my question is how can I extend this to negative number!
And also is there any name for this format?

Thanks.
Posted
Updated 27-Mar-12 7:48am
v2
Comments
Sergey Alexandrovich Kryukov 27-Mar-12 13:14pm    
Why not just writing the binary value as is?
--SA
aidin Tajadod 27-Mar-12 13:33pm    
Thank SA,
It seems shorter! I can save alot for smaller value! Also there are some protocols which are using this format! (ex SNMP)
Sergey Alexandrovich Kryukov 27-Mar-12 23:01pm    
That I understand. Thank you for answering, anyway.
--SA
aidin Tajadod 29-Mar-12 1:03am    
Thank you SA for your time.

OK, I'll get your code - interesting algorithm, but something I don't understand:
(a & b) | c = (a | c) & (b | c)
(value & 0x7F) | 0x80 = (value | 0x80) & 0xFF = value | 0x80 (considering only the last 8 bits)
(Distributive property)
So for what is the "& 0xFF"? I thought the "(byte)" truncates the leading bits anyway, isn't it?

In principle each byte starts with an 1 except the last which is always 0.
So the integer can be reassembled, otherwise the number of bytes to read cannot be determined).
If the value is small, only a few bytes are needed (best case 1 Byte), if the value is big, a lot more are used.
In the worst case (as you can only store 7 bits per byte, because the 8th bit is used for orientation) you will need 32 / 7 = 4.57 -> 5 bytes.

So if you know that it is more probable to get small values than bigger ones, this algorithm can be good. But I wouldn't use it for something else than micro-eletronics (where each byte counts) as we live in a world with unlimited storage...

You have IMHO nothing to consider when writing negative numbers, as the bits are only inverted (two's-complement) - it doesn't matter whether you use uint or int for "value".
The only problem I see is, that you need a lot more storage for small negative numbers as for small positive numbers, as -1 (from -127 to 127) is equivalent to 11111111.
 
Share this answer
 
v3
Comments
aidin Tajadod 27-Mar-12 15:11pm    
Interesting! (value & 0x7F) | 0x80 = (value | 0x80) & 0xFF ! I'd never thought about it! and you are right. but to do so, I need to change this code which probably makes it incompatible with the positive integer values! So I can not use it for other things!
any way you have my 5.
Thank you.
Henning Dieterichs 27-Mar-12 15:47pm    
Hi!
>> makes it incompatible

It is totally equal, that is what I claim.
The "and" will remove all bits but the last 7. The "or" will set the first (i.e. the 8th from right) to '1'. The "(byte)" will remove all bits but the last 8.
So you can leave the "and", as the "(byte)" does already the job and the or sets the first bit anyway to '1'.
aidin Tajadod 27-Mar-12 16:59pm    
yes, the first bit shows to continue or not. So if i want to consider signing, then I need one more bit from the first byte. so I have to reconside the algorithem!
Henning Dieterichs 27-Mar-12 17:17pm    
No you haven't!
Signing is nothing more than a range where the number is defined to be negative.
The internal representation for -1 is 11111111 11111111 11111111 11111111. If you had used an ulong this would be twice as 01111111 111[...]111 (very big). If you had used an long this would be exactly as big but negative.
So it does not matter whether you have signs or not (at least for the algorithm itself).
And you don't have to reconsider your algorithm!
If your value is 258, then it won't fit in a byte anyway - a byte is eight bits, and can hold numbers between 127 and -128, or 0 - 255 if it is unsigned.

If your value is a 32 bit integer as your code implies, then it will always require 4 bytes in your output file - otherwise you cannot store the full range of numbers that the variable value can hold.

Try writing it to the file as a array of bytes - it makes things a lot easier!
C#
int value = -12345678;
byte[] bytes = BitConverter.GetBytes(value);
int rebuilt = BitConverter.ToInt32(bytes, 0);


bytes will hold four bytes:
Decimal: 178 158 067 255 
Hex:      B2  9E  43  FF
Which is the binary equivalent of -12345678 (Little endian notation, which is normal for .NET and PC's generally)
 
Share this answer
 
Comments
Henning Dieterichs 27-Mar-12 14:29pm    
That is something else.

>> then it will always require 4 bytes
That is not exactly true as you can do some tricks, if you know that value is more often smaller than bigger you can actually save some bytes...
Read my solution.
aidin Tajadod 27-Mar-12 14:42pm    
you are right, writing "as is" making it easier! but in my case, I need to write an integer that might be as small as 0 and as big as int.max! so this can save me lots of space when it is small! then I tought if I can find a way to write negative numbers too, then I can chage whole thing to use the above format instead of having two different format!
OriginalGriff 27-Mar-12 15:13pm    
The trouble is that it doesn't save much space in reality.
It saves space (sorta - I'll come to that later) if that is the only data you are writing to the file - if it isn't, then you need to add data to say how many data bits (or bytes) you are storing. Otherwise, if you write [130][2][130][2] is that two integers, both 258, or one integer much bigger? Writing data is only of any use if you can clearly and consistently read it back! :laugh:
As for space saving, well, that is a bit of a myth. When you write a file containing eight characters, how much disk space does it take? 8 bytes? 8 bytes plus the file name? 8 bytes plus the file name plus a bit for the various dates and attributes? Nope, sorry - not even close!

Probably a minimum of 4K (plus space in various tables for the file names, attributes and so forth) since that is the smallest block size the disk storage system can cope with. So a file holding 1024 bytes takes the same disk space as one holding 10 bytes.

Trust me on this: most of the time, if you are saving space with clever ideas, you aren't really saving anything, and you are taking extra processing time to save and load data.
aidin Tajadod 27-Mar-12 16:55pm    
In most of the case (the project I am working with), is saves the space specially because we are sending the generated bytes on the net. But I would agree with you if I just wanted to save them on the files.
Henning Dieterichs 27-Mar-12 15:59pm    
It is saving space ;)
If you know the distribution, you can save space. Nothing else does zip or rar.
But if the distribution is uniformly (i.e. no entropy), in the sum you wont save space (so exe-files cannot be compressed really).
If the distribution is log(n) or something like that, it is more possible that smaller values occur than bigger.
So it will be more frequent that 1 byte (best case) will be written instead of 5 (worst case).
The clue is that the first bit says whether the data will be continued. If the first bit is zero, the value can be reconstructed.
So it is even possible to extend you charset (if you map chars to numbers),as you can represent any large numbers (you could append as much bytes as you want).
I think UTF-8 does something similiar.

If you want examples:
187: 00000000 00000000 0000000 10111011 -> 10000001 00111011 (saving 2 bytes)
597: 00000000 00000000 0000010 01010101 -> 10000010 01010101 (saving 2 bytes)
6: 00000000 00000000 0000000 00000110 -> 00000110 (saving 3 bytes)
- 1: 11111111 11111111 1111111 11111111 -> 11111111 11111111 11111111 11111111 01111111 (one byte lost)
And because of the leading continue-flag you can clearly read it back!
(1 at the beginning of each encoded byte means read it further, 0 means stop here)
The problem you have is that byte is unsigned. To get the negative of a number you do the one's complement of a number, which would be xFFFFFF ^ positiveInteger for a short integer (note one's complement is not the only option, but the common one). The way a negative number is known is that the first bit of the number is a 1. Obviously you are sending this on a serial port. Probably all leasing bits have to be 1's for a negative number: Need to make sure that for negative values all leading bits are 1's when converting back. Need some idea of the algoithm that converts back.


C#
long value1 = convert.ToInt64(value);
for (int i; i < 5; i++) //5 iterations (5 * 7 = 35
{
  stream.WriteByte((byte)((value & 0x7F) | 0x80));
  value >>= 7;
}
stream.WriteByte((byte)value);
 
Share this answer
 
Comments
aidin Tajadod 27-Mar-12 15:14pm    
good suggestion, thank you.
altough I cound not understand why you cast the value to long and how your code works!
Clifford Nelson 27-Mar-12 17:08pm    
The reason I cast to long is that I need to write the leading 1's whenever I write. If you really wanted to limit the bytes written you would stop writing once you have written either -1 "1111111" or 0 "0000000". You are writing 7 bits instead of 8, so when you shifted, and you had a negative number, at the end you were writting a 7 bit pattern that started with 0 bits, making it appear positive.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900