13,356,318 members (59,143 online)
alternative version

#### Stats

53.3K views
20 bookmarked
Posted 21 Jan 2005

# RunLength Compression

, 21 Jan 2005
 Rate this:
Fast Run-Length coding with variable runs sizes.

## Introduction

Run-Length is a lossless compression technique, its performance depends heavily on the input data statistics. It depends on counting sequence of repeated runs and save the run plus its count instead of the sequence of runs. So it depends heavily on the repetition in the input data. I have used it to compress array of bits ('1's and '0's), and each time I use it, I found my self rewrite the code to handle the new data format, and the type of the run (one bit, byte, word, ...). So I preferred to write a general code that has the runs size and count as an input to the compression function. You can skip the Background section if you have good experience about Run-Length algorithm.

## Background

The main idea of the Run-Length is to replace runs of the same data (like 'aaaaaa' or 'ccccc') with a counter saying how many repetitions we have. If we have 'aaaaaa' we just output 'a' and a byte to save the count, so we replaced 6 characters with only 2, and that can be done for all repeated runs in the input data, and the other unrepeated runs will be saved as it is. Compressed data format can be of any thing, just the decompression algorithm should follow this format structure in the decompression.

Example

```Input data: aaaaaabcdddddcccccrrrrrrrrrfdde (31 bytes)
Output data: a,5,b,0,c,0,d,4,c,4,r,8,f,0,d,1,e,0 (17 bytes)```

### Run-Length algorithm

#### Compression Algorithm can summarized as follows:.

```// for input b with length l
index = 0
while( index < l )
{
run = b[index]
length = 0
if run = b[++index]
while run = b[index+length]
index++, length++;
output (run, length);
}```

The algorithm scans one character (`run`) at a time. If the a character is equal to its next character, then it checks for repetition, then outputs the character with the repetition that follow it. If the a character is not equal to its next character, then it outputs it and a repetition of zero.

#### Decompression Algorithm is as follows:

```// for input b with length l
index = 0
while( index < l )
{
run = b[index++]
length = b[index++]
output (run, length+1)
}```

The decompression is so simple, just outputs the `run` repeated with its `length` plus one.

## Code Usage

Code usage is so simple, just the input buffer and its length, and the output buffer and its length. The compress function includes additional parameters:

• `nBitsPerSample`: to determine number of bits to save run length value, so maximum run length will be up to 2^`nBitsPerSample`-1.
• `nRuns`: to determine an optional dictionary of runs, with a default runs of the ASCII characters (0...255).
• `nRunCount`: to determine number of runs in the previous parameter, with a default value of 256 (256 characters).
• `nRunSize`: to determine the number of bytes in the each run, with a default value of 1 (1 byte).
```bool CompressRunLength(BYTE *pSrc, int nSrcLen, BYTE *&pDes,
int &nDesLen, int nBitsPerSample, void* nRuns,
int nRunCount, int nRunSize);
bool DecompressRunLength(BYTE *pSrc, int nSrcLen,
BYTE *&pDes, int &nDesLen);```

You can use the parameters `(..., void* nRuns, int nRunCount, int nRunSize)` to input any dictionary for runs and you can decide run size in bytes, see the next example to compress a file contains a runs with size 3:

```BYTE by[] = { 0x0,0x0,0x0,  0x84,0x82,0x84,
0xce,0xd3,0xd6,  0xff,0xff,0xff };
CompressRunLength(pSrc, nSrcLen, pDes, nDesLen,
12/*nBitsPerSample*/, by, 4/*nRunCount*/,
3/*nRunSize*/);```

Note: Input runs must be sorted as I am using binary search to search in runs array buffer.

And if you want to compress only for zeros and ones, you can use that:

```BYTE by[] = { 0x0, 0xff };
CompressRunLength(pSrc, nSrcLen, pDes, nDesLen,
12/*nBitsPerSample*/, by, 2/*nRunCount*/, 1/*nRunSize*/);
// (^_^)```

## Code Description

### CompressRunLength function

• Scan the input buffer `pSrc` and check if the run is repeated or not using binary search.
• If run is repeated then,
• set next bit in the destination buffer:
`*(pDes+(nDesLen>>3)) |= 1 << (nDesLen++&7)`
• add run index to the destination buffer:
`*(DWORD*)(pDes+(++nDesLen>>3)) |= nRunIndex << (nDesLen&7)`
• loop to get run length
`while(...) nRunLength++, ...`
• add run length to the destination buffer:
`*(DWORD*)(pDes+(nDesLen>>3)) |= nRunLength << (nDesLen&7)`
• If run is not repeated, then add next byte to the destination buffer:
`*(WORD*)(pDes+(++nDesLen>>3)) |= *(pSrc+nByte++) << (nDesLen&7)`
```bool CompressRunLength(BYTE *pSrc, int nSrcLen, BYTE *&pDes,
int &nDesLen, int nBitsPerSample,
void* nRuns, int nRunCount, int nRunSize)
{
...
// loop in the source buffer
while(nByte < nSrcLen)
if((... (nRunIndex =
BinarySearch(pSrc+nByte, nRunSize, nRuns, nRunCount))!= -1 &&
memcmp(pSrc+nByte+nRunSize,
(BYTE*)nRuns+nRunIndex*nRunSize, nRunSize) == 0) ||
...)
{    // set bit to 1 to indicate type found
*(pDes+(nDesLen>>3)) |= 1 << (nDesLen++&7);
*(DWORD*)(pDes+(++nDesLen>>3)) |=
nRunIndex << (nDesLen&7);
nDesLen += nBitsPerTypeIndex;
// skip the two repeated runs
nByte += nRunSize*2;
// get run length - 2 (without the two repeated runs)
nRunLength = 0;
while(nRunLength < nMaxRunLength && nByte < nSrcLen &&
((... memcmp(pSrc+nByte,
(BYTE*)nRuns+nRunIndex*nRunSize, nRunSize) == 0) || ...))
nRunLength++, nByte += nRunSize;
// save run length and increment destination length
//by bits per sample
*(DWORD*)(pDes+(nDesLen>>3)) |=
nRunLength << (nDesLen&7);
nDesLen += nBitsPerSample;
}
else    // copy one byte
*(WORD*)(pDes+(++nDesLen>>3)) |=
*(pSrc+nByte++) << (nDesLen&7), nDesLen += 8;
nDesLen = (nDesLen+7)/8;    // bits to bytes
...
}```
• `(nDesLen>>3)`: `>>3` to divide by 8 to reach the right byte to start with.
• `|=`: to avoid writing over the previous code.
• `(nDesLen&7)`: `&7` to get the reminder of dividing by 8, to get the start bit.
• `nRunLength << (nDesLen&7)`: to shift the value to left to match the starting bit at the destination buffer.

### DecompressRunLength function

• If next bit in the source buffer is on, then it is a run with its length:
`if((*(pSrc+(nSrcIndex>>3)) >> (nSrcIndex++&7)) & 1)`
• read run index:
`nRunIndex = GetDWORD(pSrc, nSrcIndex, nMaxTypeIndex)`
• read run length:
`nRunLength = GetDWORD(pSrc, nSrcIndex, nMaxRunLength)+2`
• loop to out the repeated runs.
`for(...)`
• If next bit in the source buffer is off, then it is a byte that needs to out:
`*(WORD*)(pDes+(nDesIndex>>3)) |= GetWORD(pSrc, nSrcIndex, 0xff)`
```bool DecompressRunLength(BYTE *pSrc, int nSrcLen,
BYTE *&pDes, int &nDesLen)
{
...
nSrcLen <<= 3; // bytes to bits
while(nSrcIndex < nSrcLen-8)
if((*(pSrc+(nSrcIndex>>3)) >> (nSrcIndex++&7)) & 1)
{
nRunIndex = GetDWORD(pSrc, nSrcIndex, nMaxTypeIndex),
nSrcIndex += nBitsPerTypeIndex;
nRunLength = GetDWORD(pSrc, nSrcIndex, nMaxRunLength)+2,
nSrcIndex += nBitsPerSample;
for(nRun = 0; nRun < nRunLength; nRun++)
for(nByte = 0; nByte < nRunSize; nByte++, nDesIndex += 8)
*(WORD*)(pDes+(nDesIndex>>3)) |=
nRuns ? GetWORD(nRuns+nRunSize*nRunIndex,
nByte<<3, 0xff) : (BYTE)nRunIndex;
}
else    // copy one byte
*(WORD*)(pDes+(nDesIndex>>3)) |=
GetWORD(pSrc, nSrcIndex, 0xff),
nDesIndex += 8, nSrcIndex += 8;
...
}```
• `(nDesIndex>>3)`: `>>3` to divide by 8 to reach the right byte to start with.
• `(nDesIndex&7)`: `&7` to get the reminder of dividing by 8, to get the start bit.
• `>>(nDesIndex&7)`: to shift right to the start bit to take the code at bit 0 (closer the wall).

## Points of Interest

1. At the compression function, I am using a binary search to search for the `run` that matches the current `run` in the input data:
```int BinarySearch(void* pValue, int nVlaueSize,
void* pArray, int nCount)
{
int nIndex, nResult, nStart = 0, nEnd = nCount-1;
while(nStart <= nEnd)
{
nIndex = (nEnd+nStart)/2;
if((nResult = memcmp((BYTE*)pArray+nIndex*nVlaueSize,
pValue, nVlaueSize)) == 0)
return nIndex;
if(nResult > 0)
nEnd = nIndex-1;
else
nStart = nIndex+1;
}
return -1;
}```

So, as I noted before, input runs must be sorted.

2. Also, I am using a very good macros to extract values starting from any bit in the input buffer:
```#define GetDWORD(buf,bit,mask) (

`mask` is used to reset any extra indeed bits.

3. My code here in this compression is using bits to save values, not bytes, so I am saving any value starting from any bit, so I didn't waste any bits. Or I can say, I use every bit in the compressed data.

## Source code files

• RunLength.cpp
• RunLength.h

## Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

## About the Author

 Other Egypt

## Comments and Discussions

 First Prev Next
 How does this works? Member 1262766610-Jul-16 8:24 Member 12627666 10-Jul-16 8:24
 it is the first compression schema that make bigger my files(strings) miketj25-Aug-06 13:50 miketj 25-Aug-06 13:50
 Re: it is the first compression schema that make bigger my files(strings) Hatem Mostafa26-Aug-06 0:44 Hatem Mostafa 26-Aug-06 0:44
 Re: it is the first compression schema that make bigger my files(strings) muhahahaha10-Jan-08 1:46 muhahahaha 10-Jan-08 1:46
 Re: it is the first compression schema that make bigger my files(strings) evolved20-Feb-09 8:20 evolved 20-Feb-09 8:20
 Im doubtful that this would be helpful, data compression does really 'work' like that. Although in some instances this may be true, the stemming/dictionaries in LZW would get less useful the more 'entropy' (removal of data patterns and addition of 'randomness') you added to the data set (which RLE would indeed do - add entropy). In short, im pretty sure that RLE is damn near useless for non-structured text (like log files, etc), so your recommendation to RLE + LZW is probably a bit off mark, but RLE -is extremely useful for bitmaps/masks and streamed measurement data as the author stated.
 Something To Watch For Anonymous26-Jan-05 3:59 Anonymous 26-Jan-05 3:59
 Last Visit: 31-Dec-99 19:00     Last Update: 23-Jan-18 2:27 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.