12,504,695 members (56,458 online)
alternative version

16.5K views
3 bookmarked
Posted

# Binary operations on byte arrays, with parallelism and pointers

, 11 Feb 2012 GPL3
 Rate this:
The basic binary operations (AND, OR, XOR, NOT, ShiftLeft, ShiftRight) applied to byte arrays, made faster thanks to the use of parallelism combined with the use of pointers.

## Introduction

The Binary Operations extension functions apply to byte arrays, to provide an easy and hopefully fast way to use the basic binary operators. The operators AND, OR, XOR, NOT, Shift Left, Shift Right are provided.

The provided functions rely on the `System.Threading.Tasks.Parallel` library and on many `unsafe` zones where pointers are used to access sequentially the elements of the byte arrays.

## Background

In the beginning, I was using a simple `for` cycle that repeated the same binary operation on all the elements of the arrays, like in the following statement:

```public static byte[] Bin_And_noParallel_noPointers(this byte[] ba, byte[] bt)
{
int longlen = Math.Max(ba.Length, bt.Length);
int shortlen = Math.Min(ba.Length, bt.Length);
byte[] result = new byte[longlen];
for (int i = 0; i < shortlen; i++)
{
result[i] = (byte)(ba[i] & bt[i]);
}
return result;
}```

Then a question popped into my mind: "how can I make it faster?" I then tried three solutions (which are implemented in the code at the end of the article): parallelism, use of pointers, and substitute the pointers to bytes with pointers to integers.

## Using the code

The provided class was written using C# 4.0 (in the .NET environment). Once you can access its namespace, this class extends the functionality of the byte array (`byte[]`) with these six functions:

```public static byte[] Bin_And(this byte[] ba, byte[] bt)
public static byte[] Bin_Or(this byte[] ba, byte[] bt)
public static byte[] Bin_Xor(this byte[] ba, byte[] bt)
public static byte[] Bin_Not(this byte[] ba)
public static byte[] Bin_ShiftLeft(this byte[] ba, int bits)
public static byte[] Bin_ShiftRight(this byte[] ba, int bits)```

These functions are extended methods and are to be used in the way explained in the following example, where a binary 'AND' operation is executed, between two byte arrays:

```byte[] MyFirstByteArray, MySecondByteArray;
// fill the arrays here, for example by loading them from files
// (System.IO.File.ReadAllBytes) or filling them manually or with random bytes
byte[] result = MyFirstByteArray.Bin_And(MySecondByteArray);```

If you are going to try these functions, remember to check the box 'Allow unsafe code' in Project -> Properties -> Build, since the pointers are used in usafe zones...

I've tested these operations by filling the byte arrays with literature texts (like the 'Divine Comedy' from the famous Italian poet Dante Alighieri, who wrote it around the years 1308 and 1321; so there shouldn't be copyright problems) longer than 500,000 bytes, and it worked fine.

## Explanation of the code

As I mentioned before, during my journey to improve the simple binary functions, I tried three different strategies to make them faster: parallelism, pointers, and integers instead of bytes.

At first I tried using the simple `System.Threading.Tasks.Parallel.For` and the execution speed increased a bit, but not so much.

Then I tried the pointer approach: accessing the array through the index (as, for example, in `MyArray[175]`) is slower than accessing the same data through a pointer that is moved sequentially from the first to the last byte. This second approach gave good results, similar, if not even faster, than the parallel approach.

Then... why not try and combine them?

Well, using the `Parallel.For` was a bad idea, since the overall speed dropped very much.

The final solution was a bit more complicated, but the increase of speed (especially with large arrays) has been worth the while: look hereinafter to the sample code for the 'AND' binary operation:

```public static byte[] Bin_And(this byte[] ba, byte[] bt)
{
int lenbig = Math.Max(ba.Length, bt.Length);
int lensmall = Math.Min(ba.Length, bt.Length);
byte[] result = new byte[lenbig];
int ipar = 0;
object o = new object();
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* ptres = result, ptba = ba, ptbt = bt)
{
byte* pr = (byte*)ptres;
byte* pa = (byte*)ptba;
byte* pt = (byte*)ptbt;
pr += actidx; pa += actidx; pt += actidx;
while (pr < ptres + lensmall)
{
*pr = (byte)(*pt & *pa);
pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
}
}
}
};

System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);
return result;
}```

In this function, a delegate (containing the code to execute the AND operation) is declared, and then the same delegate is invoked a number of times equal to the `paralleldegree` variable (which is a class level variable assigned inside the static constructor (as you can see in the complete class code, at the end of the article) and contains the number of processors available for the current machine). The various instances of the delegate are, of course, invoked in a parallel manner.

Inside the delegate, pointers are declared to the input and output byte arrays, and then, inside the `while` cycle, they are moved to the next element at each cycle. The core of the function is the simple line `*pr = (*pt & *pa);`, where the AND binary operation is executed between the input values (`*pt` and `*pa`) putting the result in the output (`*pr`).

Now, if `paralleldegree` is 1 (i.e., no parallelism), this delegate is quite easy to understand.

But let's assume, for example, `paralleldegree` is 4. This creates the problem that the four instances of the delegate would execute exactly the same code four times, resulting in a very slow execution time. So, the idea, when `paralleldegree` is 4 (or any number greater than 1), is to make each delegate start from a different byte of the input byte arrays, and then, inside the `while` cycle, jump to the next 4 bytes, at every cycle, thus calculating the result without collisions with the other delegates running in parallel.

In this case, the first delegate would calculate the bytes 0, 4, 8, 12, etc; the second delegate would calculate the bytes 1, 5, 9, 13, etc., the third 2, 6, 10, 14 ..., and the fourth 3, 7, 11, 15... The initial `lock` statement serves exactly the purpose of making the delegates start from sequential points, thanks to the `ipar` variable (function level) that is increased at each call (four times when `paralleldegree` is 4), and the `actidx` variable (delegate level), which is assigned only once (for every delegate) and acts something like the 'initial offset' of the pointers, for every delegate (as can be seen in the line `pr += actidx; pa += actidx; pt += actidx;`).

Using the parallelism together with the pointers resulted in a very good increase in execution speed (by the waw: to analyze the execution speed, I used to put the function to be tested inside a loop, and executing it 100 or 1000 times, measuring the time with the `System.Diagnostics.Stopwatch` class); but a further improvement can be done, by replacing the byte pointers (`byte*`) with `uint` pointers (`uint*`): `uint` is four times bigger than `byte`, so every cycle would be four times shorter. Well, the speed increased again; not four times, but 'only' two times. So, this last improvement was, in my opinion, worth the while.

## Full class code

```using System;

namespace MySpace
{
public static class BinOps
{
private const int bitsinbyte = 8;
static BinOps()
{
paralleldegree = Environment.ProcessorCount;
uintsize = sizeof(uint) / sizeof(byte); // really paranoid, uh ?
bitsinuint = uintsize * bitsinbyte;
}

public static byte[] Bin_And(this byte[] ba, byte[] bt)
{
int lenbig = Math.Max(ba.Length, bt.Length);
int lensmall = Math.Min(ba.Length, bt.Length);
byte[] result = new byte[lenbig];
int ipar = 0;
object o = new object();
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* ptres = result, ptba = ba, ptbt = bt)
{
uint* pr = (uint*)ptres;
uint* pa = (uint*)ptba;
uint* pt = (uint*)ptbt;
pr += actidx; pa += actidx; pt += actidx;
while (pr < ptres + lensmall)
{
*pr = (*pt & *pa);
pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
}
}
}
};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);
return result;
}

public static byte[] Bin_Or(this byte[] ba, byte[] bt)
{
int lenbig = Math.Max(ba.Length, bt.Length);
int lensmall = Math.Min(ba.Length, bt.Length);
byte[] result = new byte[lenbig];
int ipar = 0;
object o = new object();
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* ptres = result, ptba = ba, ptbt = bt)
{
uint* pr = (uint*)ptres;
uint* pa = (uint*)ptba;
uint* pt = (uint*)ptbt;
pr += actidx; pa += actidx; pt += actidx;
while (pr < ptres + lensmall)
{
*pr = (*pt | *pa);
pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
}
uint* pl = ba.Length > bt.Length ? pa : pt;
while (pr < ptres + lenbig)
{
*pr = *pl;
pr += paralleldegree; pl += paralleldegree;
}
}
}
};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);

return result;
}

public static byte[] Bin_Xor(this byte[] ba, byte[] bt)
{
int lenbig = Math.Max(ba.Length, bt.Length);
int lensmall = Math.Min(ba.Length, bt.Length);
byte[] result = new byte[lenbig];
int ipar = 0;
object o = new object();
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* ptres = result, ptba = ba, ptbt = bt)
{
uint* pr = ((uint*)ptres) + actidx;
uint* pa = ((uint*)ptba) + actidx;
uint* pt = ((uint*)ptbt) + actidx;
while (pr < ptres + lensmall)
{
*pr = (*pt ^ *pa);
pr += paralleldegree; pa += paralleldegree; pt += paralleldegree;
}
uint* pl = ba.Length > bt.Length ? pa : pt;
while (pr < ptres + lenbig)
{
*pr = *pl;
pr += paralleldegree; pl += paralleldegree;
}
}
}
};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);

return result;
}

public static byte[] Bin_Not(this byte[] ba)
{
int len = ba.Length;
byte[] result = new byte[len];

int ipar = 0;
object o = new object();
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* ptres = result, ptba = ba)
{
uint* pr = (uint*)ptres;
uint* pa = (uint*)ptba;
pr += actidx; pa += actidx;
while (pr < ptres + len)
{
*pr = ~(*pa);
pr += paralleldegree; pa += paralleldegree;
}
}
}
};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);
return result;
}

public static byte[] Bin_ShiftLeft(this byte[] ba, int bits)
{
int ipar = 0;
object o = new object();

int len = ba.Length;
if (bits >= len * bitsinbyte) return new byte[len];
int shiftbits = bits % bitsinuint;
int shiftuints = bits / bitsinuint;
byte[] result = new byte[len];

if (len > 1)
{
// first uint is shifted without carry from previous byte (previous byte does not exist)
unsafe
{
fixed (byte* fpba = ba, fpres = result)
{
uint* pres = (uint*)fpres + shiftuints;
uint* pba = (uint*)fpba;
*pres = *pba << shiftbits;
}
}
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* fpba = ba, fpres = result)
{
// pointer to results; shift the bytes in the result
// (i.e. move left the pointer to the result)
uint* pres = (uint*)fpres + shiftuints + actidx + 1;
// pointer to original data, second byte
uint* pba1 = (uint*)fpba + actidx + 1;
if (shiftbits == 0)
{
while (pres < fpres + len)
{
*pres = *pba1;
pres += paralleldegree; pba1 += paralleldegree;
}
}
else
{
// pointer to original data, first byte
uint* pba2 = (uint*)fpba + actidx;
while (pres < fpres + len)
{
*pres = *pba2 >> (bitsinuint - shiftbits) | *pba1 << shiftbits;
pres += paralleldegree; pba1 += paralleldegree; pba2 += paralleldegree;
}
}
}
};

};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);
}

return result;
}

public static byte[] Bin_ShiftRight(this byte[] ba, int bits)
{
int ipar = 0;
object o = new object();
int len = ba.Length;
if (bits >= len * bitsinbyte) return new byte[len];
int ulen = len / uintsize + 1 - (uintsize - (len % uintsize)) / uintsize;
int shiftbits = bits % bitsinuint;
int shiftuints = bits / bitsinuint;
byte[] result = new byte[len];

if (len > 1)
{
unsafe
{
fixed (byte* fpba = ba, fpres = result)
{
uint* pres = (uint*)fpres + ulen - shiftuints - 1;
uint* pba = (uint*)fpba + ulen - 1;
*pres = *pba >> shiftbits;
}
}
System.Action paction = delegate()
{
int actidx;
lock (o)
{
actidx = ipar++;
}
unsafe
{
fixed (byte* fpba = ba, fpres = result)
{
// pointer to results; shift the bytes in the result
// (i.e. move left the pointer to the result)
uint* pres = (uint*)fpres + actidx;
// pointer to original data, first useful byte
uint* pba1 = (uint*)fpba + shiftuints + actidx;
if (shiftbits == 0)
{
while (pres < ((uint*)fpres) + ulen - shiftuints - 1)
{
*pres = *pba1;
// increment pointers to next position
pres += paralleldegree; pba1 += paralleldegree;
}
}
else
{
// pointer to original data, second useful byte
uint* pba2 = (uint*)fpba + shiftuints + actidx + 1;
while (pres < ((uint*)fpres) + ulen - shiftuints - 1)
{
// Core shift operation
*pres = (*pba1 >> shiftbits | *pba2 << (bitsinuint - shiftbits));
// increment pointers to next position
pres += paralleldegree; pba1 += paralleldegree; pba2 += paralleldegree;
}
}
}
};
};
System.Action[] actions = new Action[paralleldegree];
for (int i = 0; i < paralleldegree; i++)
{
actions[i] = paction;
}
Parallel.Invoke(actions);
}
return result;
}
}
}```

## Share

 Software Developer Switzerland
c#, Silverlight, C++, c, VBA, SQL, Oracle

## You may also be interested in...

 Pro Pro

 First Prev Next
 Not so sure RugbyLeague19-Jun-14 3:52 RugbyLeague 19-Jun-14 3:52
 Can you provide real times? SledgeHammer0111-Feb-12 17:51 SledgeHammer01 11-Feb-12 17:51
 Re: Can you provide real times? Bruno Tabbia12-Feb-12 23:45 Bruno Tabbia 12-Feb-12 23:45
 Re: Can you provide real times? SledgeHammer0113-Feb-12 6:43 SledgeHammer01 13-Feb-12 6:43
 Re: Can you provide real times? Bruno Tabbia13-Feb-12 22:40 Bruno Tabbia 13-Feb-12 22:40