Click here to Skip to main content
13,352,280 members (67,393 online)
Click here to Skip to main content
Add your own
alternative version


74 bookmarked
Posted 22 Jun 2011

Word Aligned Hybrid (WAH) Compression for BitArrays

, 25 Jul 2011
Rate this:
Please Sign up or sign in to vote.
Word Aligned Hybrid (WAH) compression for BitArrays.
This is an old version of the currently published article.


After searching the internet for a .NET implementation for WAH compressed BitArray and not finding one, I decided to write and publish an article here so the .NET community are not left out as all the available implementations are in the Java language. This ties into a pretty advanced topic of bitmap indexing techniques for databases and I needed this for my upcoming RaptorDB Document data-store database engine.

What is it?

A BitArray is a data structure in the .NET library for storing true/false or bits of data in a compact form within an array of Int32 objects. A WAH or word-aligned hybrid BitArray is a special run length compressed version of a BitArray which saves a lot of space and memory. All the implementations that exist in Java essentially duplicate the functionality of a BitSet, that is the AND, OR, NOT, and XOR operations with the compressed internal storage format.

In my implementation, I defer the functionality of the BitArray to itself and just add compression and decompression routines. This is much faster than the Java way at the expense of memory usage. To overcome this, I have also added a FreeMemory method to release the BitArray contents and keep the compressed contents. Arguably, if you are using 100 million bits, then a full implementation is more performant than my implementation, but for most of our Use Cases, we are at most in the 10 millions of bits range.

This original method was invented at the Berkeley Labs of US Department of Energy; it is a project named FastBit and is used for high energy physics department experiments; you can see it here: FastBit site.

Why should I care?

So what?, you ask. Well, as mentioned before, BitArrays are used in an indexing technique called bitmap indexes (Wiki) and column based databases which store data in columns instead of rows. An example which you might know is Microsoft's PowerPivot for Excel which can process millions of rows in seconds. Interestingly, Microsoft has only recently announced the implementation of bitmap indexes in the upcoming SQL Server platform, post 2008 R2. It has long been in use by other RDBM vendors like Oracle.

How it works

The WAH compression algorithm is as follows:

  1. Take 31 bits from the array.
  2. If all zeros, then increment the zero count by 31.
    1. if ones count > 0, then output 32 bits with bit 32 =1 and bit 31=1 and 0-30 = ones count.
  3. If all ones, then increment the ones count by 31.
    1. If zeros count >0, then output 32 bits with bit 32=1 and bit 31=0 and 0-30 = zeros count.
  4. Else output the 31 bits as 32 bits with bit 32 = 0.
    1. If zeros or ones count >0, then output as above.

From the above, in the worst case, you will get N/31 more bits encoded or about 3% increase in size to the original.

What you get

WAHBitArray is essentially the same as the standard BitArray in the .NET Framework, with the following additions:

  • FreeMemory(): this will first compress the internal BitArray then free the memory it uses.
  • GetCompressed(): this will compress the current BitArray then return it as an array of uint.
  • CountOnes(), CountZeros(): will count the respective bits in the array.
  • GetBitIndexes(bool): will return an enumeration using yield of the respective bit position; for example, if the bit array contains 10001... this will return integers 0,4,... if the bool parameter was true, and 1,2,3,... if bool was false.
  • Get(), Set(): methods implemented with auto resizing and no exceptions.
  • DebugPrint(): generates a string of 1 and 0 values.

Using the code

To create a WAHBitArray, you can do the following:

WAHBitArray ba1 = new WAHBitArray(1000000); // 1 million bits
// 2 million bits from another bitarray
WAHBitArray ba2 = new WAHBitArray(new BitArray(2000000));
WAHBitArray ba3 = new WAHBitArray(1000000, new uint[] { /* compressed values here */});
// from a compressed list of uint

To perform operations, you can do the following:

WAHBitArray result1 = ba1.And( ba2);
WAHBitArray result2 = ba1.Or( ba3);
WAHBitArray result3 = ba1.Xor( ba2);

long count = result1.CountOnes(); // will count the true values

result2.Set(2000000, true); // will resize as needed
bool b = result2.Get(3000000); // will resize as needed

foreach(int pos in result2.GetBitIndexes(true))
    ReadDatabaseRecordNumber(pos); // pos is the index of true values

Using the code v1.3

As of version 1.3, you don't need to specify the size of the BitArray and all operations will auto resize as needed.

WAHBitArray ba1 = new WAHBitArray(); // no need to specify the size
WAHBitArray ba3 = new WAHBitArray(new uint[] { /* compressed values here */});
// from a compressed list of uint

Points of interest

  • BitArray class is sealed by Microsoft so inheriting from it was not possible.
  • BitArray throws an exception if the length of two BitArrays are not equal on bit operations, WAHBitArray makes them the same as the largest before operations.
  • BitArray must be resized in 32 increments, otherwise it mangles the compression bits.


  • Initial release v1.0: 22 June 2011
  • Update v1.1: 24 June 2011
    • bit operations now return a WAHBitArray instead of BitArray
    • a bit operation will take either a WAHBitArray or a BitArray as the input
    • CountZeros(), CountOnes() methods added
    • GetBitIndexes() method added
  • Update v1.2: 28 June 2011
    • get, set methods with auto resizing
  • Update v1.3: 23 July 2011
    • removed the need to specify the initial size
    • bug fix resize in 32 increments
    • bug fix in BitArray arithmetic
    • DebugPrint() method implemented


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Mehdi Gholam
Architect -
United Kingdom United Kingdom
Mehdi first started programming when he was 8 on BBC+128k machine in 6512 processor language, after various hardware and software changes he eventually came across .net and c# which he has been using since v1.0.
He is formally educated as a system analyst Industrial engineer, but his programming passion continues.

* Mehdi is the 5th person to get 6 out of 7 Platinum's on Code-Project (13th Jan'12)
* Mehdi is the 3rd person to get 7 out of 7 Platinum's on Code-Project (26th Aug'16)

You may also be interested in...


Comments and Discussions

Discussions on this specific version of this article. Add your comments on how to improve this article here. These comments will not be visible on the final published version of this article.
QuestionRe: Why should I care? Pin
MacV1025-Apr-12 2:56
memberMacV1025-Apr-12 2:56 
AnswerRe: Why should I care? Pin
Mehdi Gholam25-Apr-12 22:22
mvpMehdi Gholam25-Apr-12 22:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180111.1 | Last Updated 25 Jul 2011
Article Copyright 2011 by Mehdi Gholam
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid