Click here to Skip to main content
15,889,861 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
I need to compress this output :

AE9B56FC5845AC8298FFC57E307145A4

and then I need to Decompress to return back.

What I have tried:

I need to compress this output :

AE9B56FC5845AC8298FFC57E307145A4

and the length should be less than 24 bit >>
Posted
Updated 24-Mar-18 23:58pm
Comments
enhzflep 26-Mar-18 0:28am    
zlib will 'compress' those 16 bytes to 19 bytes.
Simple arithmetic coding creates 18 bytes of output
Huffman with clever representation of the tree returns 11 bytes of output
*** Huffman without an included tree returns 3 bytes ***
winRar gives you back 91 bytes.


TLDR; Seems like you're trying to push 💩 up a hill with your nose, expecting it to be shiny and smelling of roses when you reach the top!



While you can hard-code the huffman tree and get back just 3 bytes - the tree will be different for different input data, so that idea is basically a non-starter.
zlib is able to use a pre-determined tree, but the output wont be as small, since it builds a full tree and then uses codes longer than optimum (for this use-case) on account of the fact it has 256 leaf nodes, rather than just 16.

Have a look for puff.c by Mark Adler if you want a relatively easy hufman source to look at. (it's in the zLib sources) Just don't expect miracles!

Generally speaking, you can't do that.
That look the hexadecimal representation of 16 bytes. If the bytes can arbitrary change then there is no way to reduce them to 3 ones.
On the other hand, if you can impose constraints on them, then you could possibly succeed (e.g. if this is a 'message' and you know, for instance, you have only 1000 of different 'messages' then you might compress them to 2 bytes, by enumeration).
Please note you say nothing about possible constraints.
 
Share this answer
 
Comments
Maciej Los 23-Mar-18 15:19pm    
5ed!
You can't, or most likely you can't - you cannot guarantee the results of any compression will be "less than n bits", and since you start with what looks like 32 hex digits, you need 128 bits to store it uncompressed, which means you are looking for around a 1:5 compression - and there isn't an algorithm on the planet that will guarantee that for "random data".

I thing you need to think about what that data is, why you are compressing it, and see if there is anything significant you can use to help - because you aren't going to find anything that will give you what you want for any input value.
 
Share this answer
 
Comments
Maciej Los 23-Mar-18 15:19pm    
5ed!
The string seems to be hex data. So you can convert it to a byte array of size 16.

But that can't be further compressed in your case because compression algorithms benefit mainly from repeated sequences and reduced ranges which are not present in your string when converted to a byte array. With your relative short input, the result might be even larger than the input for most common compressing methods.

Even with more compression friendly input you will very probably not get a final length of 24 bits (3 bytes) from a 16 byte binary input or a string with length 32.
 
Share this answer
 
Comments
Maciej Los 23-Mar-18 15:19pm    
5ed!
This is completely unrealistic.
Your sample data is hexadecimal, it mean that each char is 4 bits.
Thus "AE9B56FC5845AC8298FFC57E307145A4" is hexadecimal of 32*4=128 bits.
Without any knowledge of what is this data, there is no way to compress it to 24 bits, it is not even sure it can be compressed at all.
 
Share this answer
 
Compressing is an art, mostly you need some reliable decompression to. If your text is an hexadecimal representation you need 4 bits for such a character. So the first char gets the 4 first bits of a byte. Best you work with bit shift operators.
C++
// A => 11
// E => 15
buffer[0] = (11 << 4) + 15;
char high = buffer[0] >> 4;// high value
char low = buffer[0] & 15;// low value (flag low 4 bits) 
 
Share this answer
 
You can use a 'map' compression, which is essentially:

char value;
if (input == "AE9B56FC5845AC8298FFC57E307145A4") value = 1;

...

if (value == 1) output "AE9B56FC5845AC8298FFC57E307145A4"


But this is a bit unrealistic in the real world, although I've beaten your '24 bit' requirement by a long shot.
 
Share this answer
 
I'll give you a hint.
In your string "
AE9B56FC5845AC8298FFC57E307145A4
"
you have 3 occurences of letter A
2 of E
2 of 9
5 of 5
3 of F
3 of C
3 of 4

The compression is the removal of redundant characters and creating a lookup table for repeating occurences so the string can be reassembled to its original form
 
Share this answer
 
Comments
Dave Kreskowiak 25-Mar-18 10:35am    
Yeah, but it's still impossible to fulfill the limitation of "the length should be less than 24 bit".

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900