Go to top

# RunLength Compression

, 21 Jan 2005
 Rate this:
Fast Run-Length coding with variable runs sizes.

## Introduction

Run-Length is a lossless compression technique, its performance depends heavily on the input data statistics. It depends on counting sequence of repeated runs and save the run plus its count instead of the sequence of runs. So it depends heavily on the repetition in the input data. I have used it to compress array of bits ('1's and '0's), and each time I use it, I found my self rewrite the code to handle the new data format, and the type of the run (one bit, byte, word, ...). So I preferred to write a general code that has the runs size and count as an input to the compression function. You can skip the Background section if you have good experience about Run-Length algorithm.

## Background

The main idea of the Run-Length is to replace runs of the same data (like 'aaaaaa' or 'ccccc') with a counter saying how many repetitions we have. If we have 'aaaaaa' we just output 'a' and a byte to save the count, so we replaced 6 characters with only 2, and that can be done for all repeated runs in the input data, and the other unrepeated runs will be saved as it is. Compressed data format can be of any thing, just the decompression algorithm should follow this format structure in the decompression.

Example

```Input data: aaaaaabcdddddcccccrrrrrrrrrfdde (31 bytes)
Output data: a,5,b,0,c,0,d,4,c,4,r,8,f,0,d,1,e,0 (17 bytes)```

### Run-Length algorithm

#### Compression Algorithm can summarized as follows:.

```// for input b with length l
index = 0
while( index < l )
{
run = b[index]
length = 0
if run = b[++index]
while run = b[index+length]
index++, length++;
output (run, length);
}```

The algorithm scans one character (`run`) at a time. If the a character is equal to its next character, then it checks for repetition, then outputs the character with the repetition that follow it. If the a character is not equal to its next character, then it outputs it and a repetition of zero.

#### Decompression Algorithm is as follows:

```// for input b with length l
index = 0
while( index < l )
{
run = b[index++]
length = b[index++]
output (run, length+1)
}```

The decompression is so simple, just outputs the `run` repeated with its `length` plus one.

## Code Usage

Code usage is so simple, just the input buffer and its length, and the output buffer and its length. The compress function includes additional parameters:

• `nBitsPerSample`: to determine number of bits to save run length value, so maximum run length will be up to 2^`nBitsPerSample`-1.
• `nRuns`: to determine an optional dictionary of runs, with a default runs of the ASCII characters (0...255).
• `nRunCount`: to determine number of runs in the previous parameter, with a default value of 256 (256 characters).
• `nRunSize`: to determine the number of bytes in the each run, with a default value of 1 (1 byte).
```bool CompressRunLength(BYTE *pSrc, int nSrcLen, BYTE *&pDes,
int &nDesLen, int nBitsPerSample, void* nRuns,
int nRunCount, int nRunSize);
bool DecompressRunLength(BYTE *pSrc, int nSrcLen,
BYTE *&pDes, int &nDesLen);```

You can use the parameters `(..., void* nRuns, int nRunCount, int nRunSize)` to input any dictionary for runs and you can decide run size in bytes, see the next example to compress a file contains a runs with size 3:

```BYTE by[] = { 0x0,0x0,0x0,  0x84,0x82,0x84,
0xce,0xd3,0xd6,  0xff,0xff,0xff };
CompressRunLength(pSrc, nSrcLen, pDes, nDesLen,
12/*nBitsPerSample*/, by, 4/*nRunCount*/,
3/*nRunSize*/);```

Note: Input runs must be sorted as I am using binary search to search in runs array buffer.

And if you want to compress only for zeros and ones, you can use that:

```BYTE by[] = { 0x0, 0xff };
CompressRunLength(pSrc, nSrcLen, pDes, nDesLen,
12/*nBitsPerSample*/, by, 2/*nRunCount*/, 1/*nRunSize*/);
// (^_^)```

## Code Description

### CompressRunLength function

• Scan the input buffer `pSrc` and check if the run is repeated or not using binary search.
• If run is repeated then,
• set next bit in the destination buffer:
`*(pDes+(nDesLen>>3)) |= 1 << (nDesLen++&7)`
• add run index to the destination buffer:
`*(DWORD*)(pDes+(++nDesLen>>3)) |= nRunIndex << (nDesLen&7)`
• loop to get run length
`while(...) nRunLength++, ...`
• add run length to the destination buffer:
`*(DWORD*)(pDes+(nDesLen>>3)) |= nRunLength << (nDesLen&7)`
• If run is not repeated, then add next byte to the destination buffer:
`*(WORD*)(pDes+(++nDesLen>>3)) |= *(pSrc+nByte++) << (nDesLen&7)`
```bool CompressRunLength(BYTE *pSrc, int nSrcLen, BYTE *&pDes,
int &nDesLen, int nBitsPerSample,
void* nRuns, int nRunCount, int nRunSize)
{
...
// loop in the source buffer
while(nByte < nSrcLen)
if((... (nRunIndex =
BinarySearch(pSrc+nByte, nRunSize, nRuns, nRunCount))!= -1 &&
memcmp(pSrc+nByte+nRunSize,
(BYTE*)nRuns+nRunIndex*nRunSize, nRunSize) == 0) ||
...)
{    // set bit to 1 to indicate type found
*(pDes+(nDesLen>>3)) |= 1 << (nDesLen++&7);
*(DWORD*)(pDes+(++nDesLen>>3)) |=
nRunIndex << (nDesLen&7);
nDesLen += nBitsPerTypeIndex;
// skip the two repeated runs
nByte += nRunSize*2;
// get run length - 2 (without the two repeated runs)
nRunLength = 0;
while(nRunLength < nMaxRunLength && nByte < nSrcLen &&
((... memcmp(pSrc+nByte,
(BYTE*)nRuns+nRunIndex*nRunSize, nRunSize) == 0) || ...))
nRunLength++, nByte += nRunSize;
// save run length and increment destination length
//by bits per sample
*(DWORD*)(pDes+(nDesLen>>3)) |=
nRunLength << (nDesLen&7);
nDesLen += nBitsPerSample;
}
else    // copy one byte
*(WORD*)(pDes+(++nDesLen>>3)) |=
*(pSrc+nByte++) << (nDesLen&7), nDesLen += 8;
nDesLen = (nDesLen+7)/8;    // bits to bytes
...
}```
• `(nDesLen>>3)`: `>>3` to divide by 8 to reach the right byte to start with.
• `|=`: to avoid writing over the previous code.
• `(nDesLen&7)`: `&7` to get the reminder of dividing by 8, to get the start bit.
• `nRunLength << (nDesLen&7)`: to shift the value to left to match the starting bit at the destination buffer.

### DecompressRunLength function

• If next bit in the source buffer is on, then it is a run with its length:
`if((*(pSrc+(nSrcIndex>>3)) >> (nSrcIndex++&7)) & 1)`
`nRunIndex = GetDWORD(pSrc, nSrcIndex, nMaxTypeIndex)`
`nRunLength = GetDWORD(pSrc, nSrcIndex, nMaxRunLength)+2`
• loop to out the repeated runs.
`for(...)`
• If next bit in the source buffer is off, then it is a byte that needs to out:
`*(WORD*)(pDes+(nDesIndex>>3)) |= GetWORD(pSrc, nSrcIndex, 0xff)`
```bool DecompressRunLength(BYTE *pSrc, int nSrcLen,
BYTE *&pDes, int &nDesLen)
{
...
nSrcLen <<= 3; // bytes to bits
while(nSrcIndex < nSrcLen-8)
if((*(pSrc+(nSrcIndex>>3)) >> (nSrcIndex++&7)) & 1)
{
nRunIndex = GetDWORD(pSrc, nSrcIndex, nMaxTypeIndex),
nSrcIndex += nBitsPerTypeIndex;
nRunLength = GetDWORD(pSrc, nSrcIndex, nMaxRunLength)+2,
nSrcIndex += nBitsPerSample;
for(nRun = 0; nRun < nRunLength; nRun++)
for(nByte = 0; nByte < nRunSize; nByte++, nDesIndex += 8)
*(WORD*)(pDes+(nDesIndex>>3)) |=
nRuns ? GetWORD(nRuns+nRunSize*nRunIndex,
nByte<<3, 0xff) : (BYTE)nRunIndex;
}
else    // copy one byte
*(WORD*)(pDes+(nDesIndex>>3)) |=
GetWORD(pSrc, nSrcIndex, 0xff),
nDesIndex += 8, nSrcIndex += 8;
...
}```
• `(nDesIndex>>3)`: `>>3` to divide by 8 to reach the right byte to start with.
• `(nDesIndex&7)`: `&7` to get the reminder of dividing by 8, to get the start bit.
• `>>(nDesIndex&7)`: to shift right to the start bit to take the code at bit 0 (closer the wall).

## Points of Interest

1. At the compression function, I am using a binary search to search for the `run` that matches the current `run` in the input data:
```int BinarySearch(void* pValue, int nVlaueSize,
void* pArray, int nCount)
{
int nIndex, nResult, nStart = 0, nEnd = nCount-1;
while(nStart <= nEnd)
{
nIndex = (nEnd+nStart)/2;
if((nResult = memcmp((BYTE*)pArray+nIndex*nVlaueSize,
pValue, nVlaueSize)) == 0)
return nIndex;
if(nResult > 0)
nEnd = nIndex-1;
else
nStart = nIndex+1;
}
return -1;
}```

So, as I noted before, input runs must be sorted.

2. Also, I am using a very good macros to extract values starting from any bit in the input buffer:
```#define GetDWORD(buf,bit,mask) (
```

`mask` is used to reset any extra indeed bits.

3. My code here in this compression is using bits to save values, not bytes, so I am saving any value starting from any bit, so I didn't waste any bits. Or I can say, I use every bit in the compressed data.

## Source code files

• RunLength.cpp
• RunLength.h

## Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

Other
Egypt

## You may also be interested in...

### Why “Good Enough” Isn’t Good Enough Anymore for Software Configuration Management

 First Prev Next
 Or better yet, use LZW after using this algorithm
 Something To Watch For Anonymous 26-Jan-05 2:59
 Last Visit: 31-Dec-99 18:00     Last Update: 30-Sep-14 4:26 Refresh 1