15,610,517 members
1.00/5 (1 vote)
See more:
hey all

i have a problem with my graduation project
its arabic language compression/decompression

i have done some compression algorithm i found it up here which output's a binary array to the compressed text , i've been thinking can i use lzw binary compression to maximize my output .

because arabic language is not supported by ASCII

and how should i implement binaray lzw

the algorithm that i have used before is in this link

[^]

i got a good output but i want more desirable output , so can i use lzw binary ?

and do u have a good resource to implement it in java .

thanks
Posted
Tomas Takac 28-Nov-14 12:48pm
What exactly is your problem? Arabic text is just data. Sure you can apply LZW compression on it. If you are looking for a Java implementation there is one here on CodeProject: LZW Compression Algorithm Implemented in Java[^]
Member 11199376 28-Nov-14 13:23pm
My Problem is compressing 0's and 1's is more efficient than compressing the arabic text itself? ,, and sorry this implementation i saw i didn't know how to make it work !
Tomas Takac 28-Nov-14 15:34pm
I fail to see the difference. Isn't text also just bunch of 0's and 1's? The algorithm you link to in your question is very naive, LZW should give you better results - if that's what you are asking.

## Solution 1

This is not a correctly posed question. More exactly, there is no a problem with that.

Compression has nothing to do with languages. However, compression ratio does depend on the content of the data. You can only compare blocks of data of the same size in bytes. If you compress the block, the size of compressed data will be different. Isn't that totally natural? How do you think why compression is possible at all? Because the compression algorithm find some redundancy in data and tries to optimize the presentation of redundant data. Imagine that you all your data consists of binary ones. Then the compressed data should just say: "80 billions of 1 bits". And then imagine that the data is the random sequence of bits. Then the compression ratio, on average, could be slightly lower then 1, with the good algorithm and big sets, because minor amount of redundancy can only come at random. Isn't that logical?

Perhaps you could better understand it if you read about data compression: http://en.wikipedia.org/wiki/Data_compression[^].

—SA