## Introduction

For over a decade, I have worked on various projects involving application middleware, class libraries, 3D Graphics SDK, or game engines. Invariably there is a need to keep track of assignments and reuse of tokens, handles, contexts id, etc.

The simplest solution is just to keep incrementing a counter whenever a new token is allocated. This solution does not allow for deallocating a token. When the counter wraps back to zero, there is a possibility of collision with previously allocated tokens. This scheme works fine if the application exits before the counter wraps around.

Another method is to keep a set or list of allocated tokens. When a token is deallocated, it gets moved to another list for reuse by the next allocation. There are many variations to this scheme; but the storage cost is ultimately one integer per token.

The method that requires the least storage is clearly a bitmask where the index of each bit represents one token. A bit is set to one to indicate it is allocated and clear to zero to indicate it is available. Complexity arises when searching for the first available (zero) bit. A naive implementation would search linearly through the bitmask. This article explores a storage scheme and an algorithm that can search through a bitmask in log32(n) time. For example, the first zero bit anywhere in a bitmask of one million bits can be found with only 4 comparisons.

## Background

First, we need tools for manipulating bits. Specifically, we need the ability to find the first zero bit in an integer. I came across this website with really elegant bit manipulation algorithms.

template <class T> size_t countOneBits(register T x)
{
x -= ((x >> 1) & 0x55555555);
x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
x = (((x >> 4) + x) & 0x0f0f0f0f);
x += (x >> 8);
x += (x >> 16);
return(x & 0x0000003f);
}
template <class T> T foldBitsLeft(register T x)
{
x |= (x << 1);
x |= (x << 2);
x |= (x << 4);
x |= (x << 8);
return x | (x << 16);
}
template <class T> size_t lowestOneIdx(register T x)
{ return countOneBits(~foldBitsLeft(x)); }

A fellow member of this site by the name of waykohler just pointed out a much faster and simpler method for finding the first one bit using assembly and intrinsics. If you use either gcc or msvc, you might want to try this instead of the SWAR functions above.

size_t lowestOneIdx(register unsigned long bits)
{
#ifdef __GNUC__
if(bits) {
return __builtin_ctz(bits);
}
#elif defined(_MSC_VER)
long idx;
if(_BitScanForward(&idx,bits)) {
return idx;
}
#else
#error "Unknown compiler"
#endif
return 32;
}

## Recursive Template Bitmask

A bitmask is usually implemented as an array of integers. SWAR algorithms above only work on individual integers. These algorithms must be extended to work on an array of integers without the need for linear search loops. This is where the magic of C++ recursive templates can help. We start out with a generic template for handling an array of integers.

typedef unsigned long word_type;
enum
{
BITS_COUNT = 32,
BITS_SHIFT = 5,
BITS_MASK = BITS_COUNT - 1,
};
template <size_t Words>
class bit_array : protected bit_array<((Words+BITS_MASK) >> BITS_SHIFT)>
{
typedef bit_array<((Words+BITS_MASK) >> BITS_SHIFT)> base_type;
word_type d_words[Words];
public:
bit_array() { assign(false); }
void assign(bool x) { ::memset(this, x ? 0xff : 0x00, sizeof(bit_array)); }
bool get(size_t n) const { return 0 != d_words[n>>BITS_SHIFT] &
((word_type)1 << (n&BITS_MASK)); }
size_t find() const
{
size_t i = base_type::find();
return (i << BITS_SHIFT) +
(i < Words ? lowestOneIdx(~d_words[i]) : 0);
}
void set(size_t n, bool x)
{
size_t pos = n & BITS_MASK;
n >>= BITS_SHIFT;
word_type& word = *(d_words + n);
word &= ~((word_type)1 << pos);
word |= (word_type)x << pos;
base_type::set(n, ~word == 0);
}
};

Each template class handles an array of integers. It is entirely capable of setting or clearing any bit in its array. Its superclass is a bitmask where each bit is a flag corresponding to an integer in the array of the derived class. A flag of one indicates that the entire integer is allocated (filled with ones); a flag of zero indicates that an integer has at least one available (zero) bit. The derivation continues until the array size is one. We will need a template specialization for that.

template <> class bit_array<1>
{
word_type d_word;
public:
bit_array() : d_word(0) { }
void assign(bool x) { d_word = x ? ~(word_type)0 : 0; }
bool get(size_t n) const
{ return 0 != d_word & ((word_type)1 << (n&BITS_MASK)); }
size_t find() const { return lowestOneIdx(~d_word); }
void set(size_t n, bool x)
{
n &= BITS_MASK;
d_word &= ~((word_type)1 << n);
d_word |= (word_type)x << n;
}
};

To find an available (zero) bit in its array, each template class searches its superclass for the first available (zero) bit. This points to the exact integer that has an available (zero) bit. All that remains is to wrap the recursive template code into a class that is easier to use.

template <size_t max_bits> class FastBitMask
{
protected:
bit_array<((max_bits+BITS_MASK) >> BITS_SHIFT)> d_array;
public:
FastBitMask() : d_array() { }
explicit FastBitMask(bool x) { d_array.assign(x); }
void assign(bool x) { d_array.assign(x); }
static size_t max_size() { return max_bits; }
bool operator[](size_t n) const { return get(n); }
bool get(size_t n) const { return d_array.get(n); }
size_t find() const
{ return d_array.find(); }
void set(size_t n, bool x)
{ d_array.set(n, x); }
};

The complete listing can be downloaded from here.

## Using the Code

This algorithm provides all the required functionality of implementing a token allocator. A token allocator would contain an instance of class `FastBitMask<>`

. The allocation method would check for the first available bit then setting that bit. The deallocation method would simply clear that bit.

enum { COUNT = 1024 };
FastBitMask<COUNT> _bitbucket;
public:
int acquire()
{
size_t index = _bitbucket.find();
if (_bitbucket.max_size() <= index)
return -1;
_bitbucket.set(index, 1);
return index;
}
bool release(int i_index)
{
if (i_index < 0 || _bitbucket.max_size() <= i_index)
return false;
_bitbucket.set(i_index, 0);
return true;
}

The size of the bitmask must be predetermined to keep runtime complexity at a minimum. I usually estimate the number of tokens required in the worst case scenario. Then I double or quadruple this quantity because the storage cost is low and the search cost is even lower.

This code is perfectly suited for applications that require allocation and deallocation of unique Ids. I have used this code to allocate Ids for OpenGL contexts. On another project, this code allocates unique identifiers for objects in a scenegraph. I am sure there are many uses for this algorithm. Drop a line and let me know if you have other novel uses. Thanks for reading.