Go to top

# A Simple String Compression Algorithm

, 10 Jul 2011
 Rate this:
An efficient algorithm for encode 7-bit characters to 6 or 5 bits to reduce the size of a string

## Introduction

Data compression is always useful for encoding information using fewer bits than the original representation it would use. There are many applications where the size of information would be critical. In data communication, the size of data can affect the cost too.

This algorithm was originally implemented for use in an SMS application. Using this algorithm, it could send about 256 characters per message (typically 160 characters per message) through the same 7-bit GSM network. The idea is, this program reduces the standard 7-bit encoding to some application specific 5-bit encoding system and then pack into a byte array. This method will reduce the size of a `string `considerably when the `string `is lengthy and the compression ratio is not affected by the content of the `string`.

## The Algorithm

### Encoding

Let’s assume that we have a `string `with 8 characters (example: - “`abcdefgh`”). If we put this on a byte array, we get a byte array with the size of 8. The following diagram shows how these ASCII characters can store in an array.

`|97|98|99|100|101|102|103|104|`

A single character will need 8 bits if the characters are represented with ASCII. A set of 8 bits can represent 256 different characters. But if we consider the current application, a simple SMS might be included only around 26 different characters. Therefore it is enough to have 5-bit encoding which can give up to 32 different characters to represent. For convert to 5-bit, let’s assign new values to the above characters.

`| a = 1 | b=2 | c=3 | d=4 | e=5 | f=6 | g=7 | h=8 |`

If we look more closely at the new byte array, it will look like the following (the values of characters are in binary representation).

`00000001|00000010|00000011|00000100|00000101|00000110|00000111|00001000|`

But we still use 8 bytes for storing the 8 characters. In the next step, we will chop each byte from the position of 3rd bit from the left side and extract the 5 least significant bits. The result will be as follows:

`|00001|00010|00011|00100|00101|00110|00111|01000|`

We can rearrange these bits in an array of bytes as follows:

`|00001000|10000110|01000010|10011000|11101000|`

Now we have reduced 8 bytes to 5 bytes. The next section shows how these 5 bytes convert to the 8 bytes and get the original information.

### Decoding

When an array of bytes is given, each byte should be represented in to binary. Then all the 1’s and 0’s should be arranged as their index and then can be split to the sets of five bits. After splitting, it will be as follows:

`|00001  000|10  00011  0|0100  0010|1  00110  00|111  01000|`

These sets can be converted to decimals and these values represent the characters that we have encoded.

```|00001  = 1(a)

000|10  = 2 (b)

00011   = 3(c)

0|0100 = 4 (d)

0010|1 = 5 (e)

00110  = 6 (f)

00|111 = 7 (g)

01000| = 8 (h)  ```

Then the information can be decoded as “`abcdefgh`”.

## Using the Code

### Classes

`class SixBitEnDec`: The class that is responsible for encode and decode

### Variables

`final static public int SIX_BIT = 6;` A constant for flag the operation as 6-bit conversion

`final static public int FIVE_BIT = 5;` A constant for flag the operation as 5-bit conversion

### Functions

```byte[] encode(String txt, int bit){
int length = txt.length();
float tmpRet1=0,tmpRet2=0;
if(bit==6){
tmpRet1=3.0f;
tmpRet2=4.0f;
}else if(bit==5){
tmpRet1=5.0f;
tmpRet2=8.0f;
}
byte encoded[]=new byte[(int)(tmpRet1*Math.ceil(length/tmpRet2))];
char str[]=new char[length];
txt.getChars(0,length,str,0);
int chaVal = 0;
String temp;
String strBinary = new String("");
for (int i = 0;i<length; i++){
temp = Integer.toBinaryString(toValue(str[i]));
while(temp.length()%bit != 0){
temp="0"+temp;
}
strBinary=strBinary+temp;
}
while(strBinary.length()%8 != 0){
strBinary=strBinary+"0";
}
Integer tempInt =new Integer(0);
for(int i=0 ; i<strBinary.length();i=i+8){
tempInt = tempInt.valueOf(strBinary.substring(i,i+8),2);
encoded[i/8]=tempInt.byteValue();
}
return encoded;
}```

This function is responsible for the whole encoding operation. The parameter `txt `can be any text that contains characters from English alphabet and the `bit `is the number of bits that is to be encoded. The value of the bit can be either 5 or 6. This function simply gets the relevant value of each character from the function `toValue() `and then get binary representation of each value. Then the code makes all these binary numbers to the length of 5 or 6 (according to the value of `bit`) by chopping the most significant bits or adding zeros in front of the numbers. Finally, arrange these 1s and 0s in a `string `and split them to the sets of 8 bits in order to store in a byte array. This byte array will be returned by this function as the encoded `string`.

```String decode(byte[] encoded, int bit){
String strTemp = new String("");
String strBinary = new String("");
String strText = new String("");
Integer tempInt =new Integer(0);
int intTemp=0;
for(int i = 0;i<encoded.length;i++){
if(encoded[i]<0){
intTemp = (int)encoded[i]+256;
}else
intTemp = (int)encoded[i];
strTemp = Integer.toBinaryString(intTemp);
while(strTemp.length()%8 != 0){
strTemp="0"+strTemp;
}
strBinary = strBinary+strTemp;
}
for(int i=0 ; i<strBinary.length();i=i+bit){
tempInt = tempInt.valueOf(strBinary.substring(i,i+bit),2);
strText = strText + toChar(tempInt.intValue());
}
return strText;
}```

This function is responsible for the whole decoding operation. This function takes an array of bytes as the encoded data and the `bit `to switch the decoding to one of the 6-bit or 5- bit. In here also, the values in the array are converted to the binary representation and then converted into a `string`. After that, it will be split to the sets of 5 bits or 6 bits regarding the value of `bit`. Finally get the character that is relevant to the value from the function `toChar()` and append to a `string`. This `string `will be returned as the decoded `string `from the function.

```int toValue(char ch){
int chaVal = 0;
switch(ch){
case' ':chaVal=0;break; case'a':chaVal=1;break;
case'b':chaVal=2;break; case'c':chaVal=3;break;
case'd':chaVal=4;break; case'e':chaVal=5;break;
case'f':chaVal=6;break; case'g':chaVal=7;break;
case'h':chaVal=8;break; case'i':chaVal=9;break;
case'j':chaVal=10;break; case'k':chaVal=11;break;
case'l':chaVal=12;break; case'm':chaVal=13;break;
case'n':chaVal=14;break; case'o':chaVal=15;break;
case'p':chaVal=16;break; case'q':chaVal=17;break;
case'r':chaVal=18;break; case's':chaVal=19;break;
case't':chaVal=20;break; case'u':chaVal=21;break;
case'v':chaVal=22;break; case'w':chaVal=23;break;
case'x':chaVal=24;break; case'y':chaVal=25;break;
case'z':chaVal=26;break; case'.':chaVal=27;break;
case'*':chaVal=28;break; case',':chaVal=29;break;
case'\':chaVal=30;break; case'2':chaVal=31;break;
case'A':chaVal=32;break; case'B':chaVal=33;break;
case'C':chaVal=34;break; case'D':chaVal=35;break;
case'E':chaVal=36;break; case'F':chaVal=37;break;
case'G':chaVal=38;break; case'H':chaVal=39;break;
case'I':chaVal=40;break; case'J':chaVal=41;break;
case'K':chaVal=42;break; case'L':chaVal=43;break;
case'M':chaVal=44;break; case'N':chaVal=45;break;
case'O':chaVal=46;break; case'P':chaVal=47;break;
case'Q':chaVal=48;break; case'R':chaVal=49;break;
case'S':chaVal=50;break; case'T':chaVal=51;break;
case'U':chaVal=52;break; case'V':chaVal=53;break;
case'W':chaVal=54;break; case'0':chaVal=55;break;
case'1':chaVal=56;break; case'3':chaVal=57;break;
case'4':chaVal=58;break;case'5':chaVal=59;break;
case'6':chaVal=60;break;case'7':chaVal=61;break;
case'8':chaVal=62;break;case'9':chaVal=63;break;
default:chaVal=0;
}
return chaVal;
}```

This function returns a value for the given character in the alphabet. These values are lower than 64 so 6 bit number can represent any of these characters:

```char toChar(int val){
char ch = ' ';
switch(val){
case 0:ch=' ';break; case 1:ch='a';break;
case 2:ch='b';break; case 3 :ch='c';break;
case 4:ch='d';break; case 5 :ch='e';break;
case 6:ch='f';break; case 7 :ch='g';break;
case 8:ch='h';break; case 9 :ch='i';break;
case 10:ch='j';break; case 11:ch='k';break;
case 12:ch='l';break; case 13:ch='m';break;
case 14:ch='n';break; case 15:ch='o';break;
case 16:ch='p';break; case 17:ch='q';break;
case 18:ch='r';break; case 19:ch='s';break;
case 20:ch='t';break; case 21:ch='u';break;
case 22:ch='v';break; case 23:ch='w';break;
case 24:ch='x';break; case 25:ch='y';break;
case 26:ch='z';break; case 27:ch='.';break;
case 28:ch='*';break; case 29:ch=',';break;
case 30:ch='\';break; case 31 :ch='2';break;
case 32:ch='A';break; case 33:ch='B';break;
case 34:ch='C';break; case 35:ch='D';break;
case 36:ch='E';break; case 37:ch='F';break;
case 38:ch='G';break; case 39:ch='H';break;
case 40:ch='I';break; case 41:ch='J';break;
case 42:ch='K';break; case 43:ch='L';break;
case 44:ch='M';break; case 45:ch='N';break;
case 46:ch='O';break; case 47:ch='P';break;
case 48:ch='Q';break; case 49:ch='R';break;
case 50:ch='S';break; case 51:ch='T';break;
case 52:ch='U';break; case 53:ch='V';break;
case 54:ch='W';break; case 55:ch='0';break;
case 56:ch='1';break; case 57:ch='3';break;
case 58:ch='4';break; case 59:ch='5';break;
case 60:ch='6';break; case 61:ch='7';break;
case 62:ch='8';break; case 63:ch='9';break;
default:ch=' ';
}
return ch;
}```

This function returns the character that relates to the given number as defined in this code. The number that assigned to the character is not important but each number must be able to represent by 6 bits.

## How to Use

This program is demonstrating the use of class `SixBitEnDec `using a simple interface. You can insert some texts and observe the size of the text after compressing and also the text after decompressing.

## Share

Student University of Moratuwa
Sri Lanka
Postgraduate research student specializing in video content analysis. Interested in Image processing, HCI and Digital music production. Computer and technology enthusiast. I love coding and sharing my knowledge.

## You may also be interested in...

### Protecting Android™ Applications with Secure Code Signing Certificates

 First Prev Next
 Hello philong89it 6-Mar-14 20:15
 Re: Hello Ravimal Bandara 6-Mar-14 21:06
 Hello, vietean 28-Jul-11 16:52
 My vote of 5 Jonathan C Dickinson 12-Jul-11 0:34
 My vote of 5 nayomiranamuka 10-Jul-11 20:19
 My vote of 4 GregStevens 10-Jul-11 9:47
 A cute illustration of theory GregStevens 10-Jul-11 9:44
 Re: A cute illustration of theory Ravimal Bandara 10-Jul-11 17:39
 Why reinvent the wheel? Walt Fair, Jr. 10-Jul-11 9:43
 Re: Why reinvent the wheel? Ravimal Bandara 10-Jul-11 17:30
 Re: Why reinvent the wheel? Jonathan C Dickinson 12-Jul-11 0:33
 Re: Why reinvent the wheel? James Curran 12-Jul-11 3:29
 Re: Why reinvent the wheel? Walt Fair, Jr. 12-Jul-11 4:29
 Re: Why reinvent the wheel? Ravimal Bandara 12-Jul-11 5:16
 Re: Why reinvent the wheel? Walt Fair, Jr. 12-Jul-11 7:35
 Re: Why reinvent the wheel? bob16972 2-Aug-11 19:34
 Re: Why reinvent the wheel? Walt Fair, Jr. 3-Aug-11 5:11
 Re: Why reinvent the wheel? bob16972 5-Aug-11 22:27
 Last Visit: 31-Dec-99 18:00     Last Update: 22-Sep-14 4:53 Refresh 1