Introduction
After much searching the internet for a .net implementation for WAH compressed BitArray and not finding one, I decided to write and publish an article here so we the .net community are not left out as all the implementations are in the java language. This ties into a pretty advanced topic of bitmap indexing techniques for databases and I needed this for my upcoming RaptorDB
Document datastore database engine.
ٌWhat is it?
A BitArray
is a
data structure in .net library for storing true/false or bits of
data in a compact form within an array of Int32
objects. A WAH or
wordaligned hybrid BitArray
is a special run length compressed
version of a BitArray
which saves a lot of space and memory. All
the implementations that exist in java essentially duplicate the
functionality of a BitSet
that is the AND
, OR
, NOT
and XOR
operations with the compressed internal
storage format.
In my implementation I defer the functionality of the BitArray
to
itself and just add compression and decompression routines. This is much
faster than the java way at the expense of memory usage, to
overcome this I have also added a FreeMemory
method to release the
BitArray
contents and keep the compressed contents. Arguably if you are using 100s million bits
then a full implementation is more performant than my
implementation but for most of our usecases we are at most in the
10s millions of bits range.
This original method was invented at the Berkeley Labs of US Department of
Energy, it is a project named FastBit and is used for high energy
physics department experiments, you can see it here :
FastBit site
Why should I care?
So what! you ask?, well as mentioned before BitArrays
are used in an indexing technique called bitmap indexes (wiki) and Column based databases which store data in columns instead of rows, a example which you might know is Microsoft's PowerPivot for Excel which can process millions of rows in seconds. Interestingly Microsoft has only recently announced the implementation of bitmap indexes in the upcoming SQL Server platform post 2008 R2. It has long been in use by other RDBM vendors like Oracle.
How it works
The WAH compression algorithm is as follows:
 take 31 bits from the array.
 if all zeros then increment the zero count by 31.
 if ones count > 0 then output 32 bits with bit 32 =1
and bit 31=1 and 030 = ones count
 if all ones then increment the ones count by 31.
 if zeros count >0 then output 32 bits with bit 32=1 and
bit 31=0 and 030 = zeros count
 else output the 31 bits as 32 bits with bit 32 = 0.
 if zeros or ones count >0 then output as above before.
From the above in the worst case you will get N/31 more bits
encoded or about 3% increase in size to the original.
What you get
WAHBitArray
is essentially the same as the standard BitArray
in the .net framework with the following additions:
FreeMemory()
: this will first compress the internal BitArray
then free the the memory it used.
GetCompressed()
: this will compress the current BitArray
then return them as an array of uint
.CountOnes()
,CountZeros()
: will count the respective bits in the array.GetBitIndexes(bool)
: will return an enumeration using yield
of the respective bit position for example if the bit array contains 10001... this will return integers 0,4,... if the bool
parameter was true
and 1,2,3,... if bool was false
.
Using the code
To create a
WAHBitArray
you can do the following :
WAHBitArray ba1 = new WAHBitArray(1000000);
WAHBitArray ba2 = new WAHBitArray(new BitArray(2000000));
WAHBitArray ba3 = new WAHBitArray(1000000, new uint[] { });
To perform operations you can do the following
WAHBitArray result1 = ba1.And( ba2);
WAHBitArray result2 = ba1.Or( ba3);
WAHBitArray result3 = ba1.Xor( ba2);
long count = result1.CountOnes();
foreach(int pos in result2.GetBitIndexes(true))
{
ReadDatabaseRecordNumber(pos); }
Points of Interest

BitArray
class is sealed by Microsoft so inheriting for it
was not possible. BitArray
throws a exception if the length of two BitArrays
are not equal on bit operations, WAHBitArray
makes then the
same as the largest before operations.
History
 Initial Release v1.0 : 22^{nd} June 2011
 Update v1.1 : 24th June 2011
 bit operations now return a WAHBitArray instead of BitArray
 bit operation will take either a WAHBitArray or a BitArray as the input
 CountZeros(), CountOnes() methods added
 GetBitIndexes() method added