## Introduction

After searching the internet for a .NET implementation for WAH compressed BitArray and not finding one, I decided to write and publish an article here so the .NET community are not left out as all the available implementations are in the Java language. This ties into a pretty advanced topic of bitmap indexing techniques for databases and I needed this for my upcoming *RaptorDB* Document data-store database engine.

## What is it?

A `BitArray`

is a data structure in the .NET library for storing true/false or bits of data in a compact form within an array of `Int32`

objects. A WAH or word-aligned hybrid `BitArray`

is a special run length compressed version of a `BitArray`

which saves a lot of space and memory. All the implementations that exist in Java essentially duplicate the functionality of a `BitSet`

, that is the `AND`

, `OR`

, `NOT`

, and `XOR`

operations with the compressed internal storage format.

In my implementation, I defer the functionality of the `BitArray`

to itself and just add compression and decompression routines. This is much faster than the Java way at the expense of memory usage. To overcome this, I have also added a `FreeMemory`

method to release the `BitArray`

contents and keep the compressed contents. Arguably, if you are using 100 million bits, then a full implementation is more performant than my implementation, but for most of our Use Cases, we are at most in the 10 millions of bits range.

This original method was invented at the Berkeley Labs of US Department of Energy; it is a project named FastBit and is used for high energy physics department experiments; you can see it here: FastBit site.

## Why should I care?

So what?, you ask. Well, as mentioned before, `BitArray`

s are used in an indexing technique called bitmap indexes (Wiki) and column based databases which store data in columns instead of rows. An example which you might know is Microsoft's PowerPivot for Excel which can process millions of rows in seconds. Interestingly, Microsoft has only recently announced the implementation of bitmap indexes in the upcoming SQL Server platform, post 2008 R2. It has long been in use by other RDBM vendors like Oracle.

## How it works

The WAH compression algorithm is as follows:

- Take 31 bits from the array.
- If all zeros, then increment the zero count by 31.
- if ones count > 0, then output 32 bits with bit 32 =1 and bit 31=1 and 0-30 = ones count.

- If all ones, then increment the ones count by 31.
- If zeros count >0, then output 32 bits with bit 32=1 and bit 31=0 and 0-30 = zeros count.

- Else output the 31 bits as 32 bits with bit 32 = 0.
- If zeros or ones count >0, then output as above.

From the above, in the worst case, you will get N/31 more bits encoded or about 3% increase in size to the original.

## What you get

`WAHBitArray`

is essentially the same as the standard `BitArray`

in the .NET Framework, with the following additions:

`FreeMemory()`

: this will first compress the internal `BitArray`

then free the memory it uses.
`GetCompressed()`

: this will compress the current `BitArray`

then return it as an array of `uint`

.
`CountOnes()`

, `CountZeros()`

: will count the respective bits in the array.
`GetBitIndexes(bool)`

: will return an enumeration using `yield`

of the respective bit position; for example, if the bit array contains 10001... this will return integers 0,4,... if the `bool`

parameter was `true`

, and 1,2,3,... if bool was `false`

.
`Get()`

, `Set()`

: methods implemented with auto resizing and no exceptions.
`DebugPrint()`

: generates a `string`

of 1 and 0 values.

## Using the code

To create a `WAHBitArray`

, you can do the following:

WAHBitArray ba1 = new WAHBitArray(1000000);
WAHBitArray ba2 = new WAHBitArray(new BitArray(2000000));
WAHBitArray ba3 = new WAHBitArray(1000000, new uint[] { });

To perform operations, you can do the following:

WAHBitArray result1 = ba1.And( ba2);
WAHBitArray result2 = ba1.Or( ba3);
WAHBitArray result3 = ba1.Xor( ba2);
long count = result1.CountOnes();
result2.Set(2000000, true);
bool b = result2.Get(3000000);
foreach(int pos in result2.GetBitIndexes(true))
{
ReadDatabaseRecordNumber(pos);
}

## Using the code v1.3

As of version 1.3, you don't need to specify the size of the `BitArray`

and all operations will auto resize as needed.

WAHBitArray ba1 = new WAHBitArray();
WAHBitArray ba3 = new WAHBitArray(new uint[] { });

## Points of interest

`BitArray`

class is sealed by Microsoft so inheriting from it was not possible.
`BitArray`

throws an exception if the length of two `BitArray`

s are not equal on bit operations, `WAHBitArray`

makes them the same as the largest before operations.
`BitArray`

must be resized in 32 increments, otherwise it mangles the compression bits.

## History

**Initial release v1.0**: 22 June 2011
**Update v1.1**: 24 June 2011
- bit operations now return a
`WAHBitArray`

instead of `BitArray`

- a bit operation will take either a
`WAHBitArray`

or a `BitArray`

as the input
`CountZeros()`

, `CountOnes()`

methods added
`GetBitIndexes()`

method added

**Update v1.2**: 28 June 2011
- get, set methods with auto resizing

**Update v1.3**: 23 July 2011
- removed the need to specify the initial size
- bug fix resize in 32 increments
- bug fix in
`BitArray`

arithmetic
`DebugPrint()`

method implemented

**Update v2.0**: 25 July 2011