## Sometimes..

you have to chuck it all, and start all over again, (sigh). So I did, and if you've visited this article before, you will notice that all the methods I tested previously to create random data have vanished.. but trust me, that's a good thing. So, I spent last night racking my brain, trying to figure out how to make this stronger, that is, fortify it against any attempts at reversal, and at the same time, perhaps produce even better random from an algorithm. What follows, is the best ideas I've had to date, combined in a way that I believe creates a great random generator with a strong focus on security. So read on, a lot has changed, and I think this might be the one.. I have left all of the original algorithms in, and added new ones that I created while experimenting, (more about that later), anyways, I think this really warrants a re-read.. Oh, and I renamed it to: Chaotic State Random Generator, CSRG (nick. Chaos). Better name I think..

July 24: Some more big updates!

I was integrating this into my main project today, and made some significant changes to this generator, so I thought I'd let you know.. I added an entropy collector to the main loop of the Generate() methods. It collects bytes and periodically, extracts the entropy and uses that to rotate the expansion table, (explained under the 'Table Rotation' section of the Generate() method breakdown). I increased the seed size for both engines to 96, now in the DUAL engine, AES keys and counters are initialized purely with seed material, in SINGLE mode, the remaining 32 bytes are thrown to the entropy pool. I also changed the recommended block size from 10Kib to 100Kib, which would bring it up to about a 1000 to 1 ratio of psuedo random output to required seed material in the test implementation.

July 25: Parallel example

Added a test method to frmNist that demonstrates using CSRG in a Parallel For loop, ParallelTest(Size). Throughput on my AMD A6-3600 quadcore runs at almost 900Mib per minute using this technique.

## It always starts..

with something you need.. and I needed a really good random generator; one that could be seeded manually, was self-contained, and performed remarkably, and so my search began.. Through the wild tundra of indecipherable C code written by introverts addicted to shorthand, the mathematical voodoo penned by guys with foreheads the size of skateboards, the Blum Blum Schub's, the Yarrow's and Fortuna's, Twister's and ISSAC's, and.. then I started looking into block ciphers running in counter mode. I already had an Aes class written based largely on the mono implementation, so this should be a snap! My first counter mode implementation gave me great results, but tested with Ent as a guide, it just couldn't beat the windows built-in api RNGCryptoServiceProvider, which is a managed wrapper for the CryptGenRandom api. So I started experimenting, and expanded my random testing tool-kit with a C# version of the Ent tests, and later the Nist batch of random tests.

#### Criteria

These are the expectations I intend for the implementation:

- It must generate the same random output with the same seed each time
- It must be fast
- It must be extremely difficult to unwind
- It must produce excellent random output

## The Good Stuff

Let's start small, like generating 16 bytes of random for integers, and creating seeds for the generator. The first method is used to create random integers and taken from the CSPRNG class, it gets a random buffer, (in this case from RNGCryptoServiceProvider), gets its hash value from SHA256, and returns the first half of that hash value. The integers are drawn from the bytes via one of the BitConverter overloads. These methods are all contained in the static class CSPRNG.cs.

internal static byte[] GetSeed16()
{
byte[] data = new byte[64];
byte[] result = new byte[16];
_rngRandom.GetBytes(data);
Buffer.BlockCopy(_shaHash.ComputeHash(data), 0, result, 0, 16);
return result;
}

The class contains methods that return random for all of the integer and floating types, they look like this..

internal static Int32 NextInt32()
{
return BitConverter.ToInt32(GetSeed16(), 0);
}

A random generator requires a random seed for it's initial state, you could get creative and do something like this:

internal static byte[] GetSeed64()
{
byte[] data = new byte[128];
byte[] seed = new byte[64];
_rngRandom.GetBytes(data);
using (SHA512 shaHash = SHA512Managed.Create())
Buffer.BlockCopy(shaHash.ComputeHash(data), 0, seed, 0, 64);
return seed;
}

In this routine, we are generating 128 bytes of random with RNGCrypto, which aligns to the SHA512 blocksize, then returning the hash for that random array. This is called 'entropy extraction', it involves taking a secret random value and deriving its hash, then using that as the random source, and is a good way to build encryption keys, IVs or seeds.. (now, for the purposes of seeding a generator, I think this may be a bit excessive, RngCrypto output should be good enough.. and you can even lose entropy by going too far.. ;o)

## CSRG Random Generator

Before we even start looking at this, well, somewhat intimidating looking algorithm, I'll start right at the beginning, when the generator is initialized..

public CSRG()
{
_expansionTable = ExpandTable(SEED_TABLE);
}
public CSRG(byte[] TableSeed)
{
if (TableSeed.Length != 48)
throw new ArgumentOutOfRangeException("Expansion Seed size must be 48 bytes long!");
_expansionTable = ExpandTable(TableSeed);
}

As you can see, there are two constructors, they both create something called an 'expansion table'. The first one uses a default local byte array, in the second, the user inputs a 48 byte array of random, this is, for lack of a better description, the key to the generator. Flip just one bit in that array, and CSRG will produce a completely different random output, even with an identical seed. So, it *is* kind of like a 384 bit key, that when combined with the seed used as the initial state in each Generate(Buffer) call, adds a layer of fortification to the generator. Well, how is that really useful.. I have made an encryption software, it is a very complex software, with debugger foilers, secure erasure, erasure triggers, anti-coersion tactics, encrypted memory, double encrypted login, encrypted state/logs/database, imports raw random binary for a true OTP, and on and on.. all housed in the coolest wpf interface that I could dream up.. So, I really went kinda berzerk on this thing for the last 8 months, but now it's winding down, it's almost done, I am writing the help files.

The one thing I kept revisiting in my mind though, was that at the heart of it, it was an AES CTR, written to the 80090 spec, but still, if you know how that machine really works, and you see past the specification details, it is just a block cipher running in counter mode, and nothing much more sophisticted then that. So how do we make that better, I mean, anything coming out of AES is going to be highly randomized, and CTR especially, so it's doing a great job creating random, but..

First of all, some details about how my little xor cipher works:

For one thing, I reveal nothing in the message header. It looks like this:

- Flag - just an id that tells us this is a valid message, always the same
- Position - Not the actual position, but offset with a random ulong in the key header
- Creator - A unique 16 byte field identifying the key used to encrypt this file
- Extension - the file extension, xored with random in a key header field

The Key header looks like this:

- EngineType - AES or OTP
- Position - Current virtual position within the key file
- KeyOffset - A random ulong, that when subtracted from the message position, reveals the 'actual' position, if the key were completely expanded
- KeyID - Unique 16 byte key identifier
- MessageHash - 16 bytes of random, xored with the file extension
- TableKey - Here it is.. 48 bytes of random to unlock the generator

The rest of the key is just seed material, so a 1Mib key file would contain 10482 seeds, which would expand to about 1023 Mib of random, (so about a 1000 to 1 ratio). The messages are encrypted and decrypted on the fly by extracting the necessary seeds, generating the random, then xoring with the data.

So what useful information does an attacker really have? He can't tell what size the keyfile is without the keyfile itself, let alone the position, and whether the random used to xor the message starts on the 1st byte of the 1st seed, or the 317th byte of the 90th block.. (seeds build out to 102400 byte 'blocks' in my implementation), a 10 Mib file would use approx 100 seeds.. so it is only costing about 9.7Kib of key space to encrypt that file.

With thumb drives now at.. 256 Gb, you could store about 2,748,779,069 seeds.. so you could encrypt about 260 Tib of data with the key material from that one thumb drive.. so OTP or Pseudo OTP, has become a viable means of strong encryption, given the ever increasing capacity of portable storage devices.

Why am I telling you all this.. well, to show you how you can use a good random generator to build some very powerful encryption, and to show you how that cipher might be constructed. One of the most important tasks in building this cipher, is to hide any information in a message that might aid in its decryption, and perhaps the most important detail to hide is the positional offset within the key file, particularly if the key was generated via a deterministic algorithm. Ok, more about the cipher later..

##### How is the table built?

Just a straight AES CTR implementation, it generates 4096 bytes of random, which are copied to a Uint32 array with 1024 members..

private UInt32[] ExpandTable(byte[] Seed)
{
const int SIZE = 4096;
UInt32[] exTable = new UInt32[1024];
UInt32[] exKey = new UInt32[AESEXPANDED_KEYSIZE];
byte[] ctrBuffer = new byte[BLOCK_SIZE];
byte[] keyBuffer = new byte[KEY_BYTES];
byte[] outputBlock = new byte[BLOCK_SIZE];
byte[] outputData = new byte[SIZE];
Buffer.BlockCopy(Seed, 0, keyBuffer, 0, KEY_BYTES);
Buffer.BlockCopy(Seed, KEY_BYTES, ctrBuffer, 0, BLOCK_SIZE);
exKey = ExpandKey(keyBuffer);
for (int i = 0; i < SIZE; i += BLOCK_SIZE)
{
Increment(ctrBuffer);
Transform(ctrBuffer, outputBlock, exKey);
Buffer.BlockCopy(outputBlock, 0, outputData, i, BLOCK_SIZE);
}
Buffer.BlockCopy(outputData, 0, exTable, 0, SIZE);
return exTable;
}

So the counter and AES key are copied from the table seed, the 16 byte counter is incremented by a single bit on each iteration, then encrypted with AES, and that transform is copied to the output buffer, and finally to the uint array.

##### How is the table used?

The table is used in array stretching and randomizing routines, called extraction methods, (ExtractArray16(Buffer), ExtractArray32(Buffer)), and what these routines do, is stretch an array to 64 bytes (to align with the SHA256 block size), while randomizing it by xoring with the expansion table.. The whole apparatus goes towards a self re-seeding scheme, we'll look at in a minute. ExtractArray32() looks like this:

private byte[] ExtractArray32(byte[] SubBuffer)
{
UInt32[] tmpNum = new UInt32[8];
UInt32[] arrNum = new UInt32[16];
byte[] data = new byte[64];
Int32 ct = 0;
Buffer.BlockCopy(SubBuffer, 0, tmpNum, 0, 32);
UInt16 iter = ExtractShort(tmpNum[0], 10);
for (int i = 0; i < 8; i++)
{
arrNum[ct++] = ~tmpNum[i] ^ _expansionTable[iter];
iter = ExtractShort(arrNum[ct - 1], 10);
arrNum[ct++] = tmpNum[i] ^ _expansionTable[iter];
iter = ExtractShort(arrNum[ct - 1], 10);
}
Buffer.BlockCopy(arrNum, 0, data, 0, 64);
return ComputeHash64(data);
}

So the buffer comes in, (the combined AES transforms of 32 bytes).. the byte buffer is copied to a temporary UInt32[8] array, and then extracted to a larger 16 member UInt32 array, which is then copied back into a byte array, and hashed with SHA256 which is the actual return value. The first array member is unary reversed and xored with the expansion table using a secret indice, which is the last 8 bits of the previous value (8 bits = 0-1023), the second value is simply xored with another table value, using the indice derived from the value before it.. and so on. So this is how new keys are generated for AES, and new 16 byte counters are created for the main method. We are using the whitening to scramble the bits before it enters SHA256, this guarantees we get a unique hash every time. Now you can see, how if the table seed was changed in any way, it would mean a vastly different offset table, and in turn, different seeds and counters created when the generator re-seeds itself.

##### The Generator:

There are two aspects to the generator, how the random is created, and how the re-seed mechanism works. The random is created by running **2 AES CTR transforms in tandem**, each with its own **unique key **and 16 byte **counter**. Both **transforms **are **xored **together at the bottom of each loop and the resulting array is copied to the ouput buffer. Simple as that.

##### Chaotic ReSeed:

Along with the Key and Counter, **2 random iteration counters** are running. (reSeedCounter1 and reSeedCounter2), one for each AES transform. On each iteration through the main loop, they are both tested independantly for an even division into the differential divisor, that is; when (ReSeedCounterX % 256== 0) it triggers a state reset for that AES transform. When that happens, the last 2 transforms are combined and sent to the ExpandArray32() routine, which stretches and whitens the data, then returns the SHA256 hash, which is expanded into a new AES key. The new counter is created by sending a single block to the ExpandArray16() routine, and the UInt64 reSeedCounter for that transform is created by xoring the halves of the 16 byte CTR counter. So, because the two reseed counters are random, and because both transforms have a unique reseed counter, debugging the algo is like watching a horse race.. transform1 will gallop ahead, resetting state 3 times in a row, then transform2 will catch up and surpass it, and back and forth they go.. the point is, these two AES transforms (that are xored together to create the output), are constantly and unpredictably swapping out their state.. it should all combine to make crypto analysis, well, a bit difficult.. Here's what it looks like:

private byte[] GenerateT2(Int32 Size, byte[] Seed)
{
if (Size > ENGINE_MAXPULL)
throw new ArgumentOutOfRangeException("The size requested is too large! Maximum is "
+ ENGINE_MAXPULL.ToString() + " bytes.");
if (Seed.Length != DUAL_SEED_BYTES)
throw new ArgumentOutOfRangeException("Seed length must be " + DUAL_SEED_BYTES.ToString() + " bytes long!");
UInt16 poolSize = 0;
Int32 alignedSize = (Size % BLOCK_SIZE == 0 ? Size : Size + BLOCK_SIZE - (Size % BLOCK_SIZE));
Int32 lastBlock = alignedSize - BLOCK_SIZE;
UInt64 reSeedCounter1 = 0;
UInt64 reSeedCounter2 = 0;
UInt32[] exKey1;
UInt32[] exKey2;
byte[] ctrBuffer1 = new byte[BLOCK_SIZE];
byte[] ctrBuffer2 = new byte[BLOCK_SIZE];
byte[] keyBuffer1 = new byte[KEY_BYTES];
byte[] keyBuffer2 = new byte[KEY_BYTES];
byte[] outputBlock1 = new byte[BLOCK_SIZE];
byte[] outputBlock2 = new byte[BLOCK_SIZE];
byte[] outputData = new byte[Size];
byte[] seedBuffer = new byte[KEY_BYTES];
const Int32 SUPPRESSOR = 32;
Buffer.BlockCopy(Seed, 0, keyBuffer1, 0, KEY_BYTES);
Buffer.BlockCopy(Seed, KEY_BYTES, keyBuffer2, 0, KEY_BYTES);
Buffer.BlockCopy(Seed, KEY_BYTES * 2, ctrBuffer1, 0, BLOCK_SIZE);
Buffer.BlockCopy(Seed, (KEY_BYTES * 2) + BLOCK_SIZE, ctrBuffer2, 0, BLOCK_SIZE);
exKey1 = ExpandKey(keyBuffer1);
exKey2 = ExpandKey(keyBuffer2);
reSeedCounter1 = ExtractULong(ctrBuffer1);
reSeedCounter2 = ExtractULong(ctrBuffer2);
for (int i = 0; i < alignedSize; i += BLOCK_SIZE)
{
Increment(ctrBuffer1);
Increment(ctrBuffer2);
Transform(ctrBuffer1, outputBlock1, exKey1);
Transform(ctrBuffer2, outputBlock2, exKey2);
if (reSeedCounter1 % _differential == 0)
{
Buffer.BlockCopy(outputBlock2, 0, seedBuffer, 0, BLOCK_SIZE);
Buffer.BlockCopy(outputBlock1, 0, seedBuffer, BLOCK_SIZE, BLOCK_SIZE);
keyBuffer1 = ExtractArray32(seedBuffer);
exKey1 = ExpandKey(keyBuffer1);
ctrBuffer1 = ExtractArray16(outputBlock2);
reSeedCounter1 = ExtractULong(ctrBuffer1);
}
if (reSeedCounter2 % _differential == 0)
{
Buffer.BlockCopy(outputBlock1, 0, seedBuffer, 0, BLOCK_SIZE);
Buffer.BlockCopy(outputBlock2, 0, seedBuffer, BLOCK_SIZE, BLOCK_SIZE);
keyBuffer2 = ExtractArray32(seedBuffer);
exKey2 = ExpandKey(keyBuffer2);
ctrBuffer2 = ExtractArray16(outputBlock1);
reSeedCounter2 = ExtractULong(ctrBuffer2);
}
if (reSeedCounter1 % SUPPRESSOR == 0)
{
poolSize = ExtractShort(outputBlock2, 4);
if (poolSize > 0)
{
byte[] pool = new byte[poolSize];
Buffer.BlockCopy(outputBlock1, 0, pool, 0, poolSize);
EntropyPool(pool);
}
}
if (reSeedCounter2 % SUPPRESSOR == 0)
{
poolSize = ExtractShort(outputBlock1, 4);
if (poolSize > 0)
{
byte[] pool = new byte[poolSize];
Buffer.BlockCopy(outputBlock2, 0, pool, 0, poolSize);
EntropyPool(pool);
}
}
for (int j = 0; j < BLOCK_SIZE; j++)
outputBlock1[j] ^= outputBlock2[j];
if (i != lastBlock)
{
Buffer.BlockCopy(outputBlock1, 0, outputData, i, BLOCK_SIZE);
}
else
{
int finalSize = (Size % BLOCK_SIZE) == 0 ? BLOCK_SIZE : (Size % BLOCK_SIZE);
Buffer.BlockCopy(outputBlock1, 0, outputData, i, finalSize);
}
reSeedCounter1++;
reSeedCounter2++;
}
return outputData;
}

##### Table Rotation

So we have the expansion table, which as you now know, is used in the expansion routines where new keys and counters are created for the internal state transitions. Well, what if we could change that table on the fly, repeatedly filling it with new offset values at irregular intervals? If you look in the code above, you'll see a few lines near the bottom of the routine beneath the comment 'Entropy collector'. What this code does, is grab a few bytes and pass them to a method called EntropyPool(Buffer). There is a clause for each of the two transforms, it uses the (reSeedCounterX % SUPPRESSOR == 0) modulus to slow it down, and it gets the number of bytes to send to the entropy pool, from the last 4 bits of the other transform. It may pass between 1 and 15 bytes to the pool, the number dependent on whatever those 4 bits might be.. So it's just stealing a few bytes here and there, and about every 4000 bytes of output created, (adjustable via the SUPPRESSOR constant, which is set to 32), it will trigger a rebuild of the expansion table, giving it a whole new set of values. Here's what the EntropyPool() method looks like:

private void EntropyPool(byte[] Data)
{
int count = Data.Length;
if (count + _entropyCount > 128)
count = 128 % _entropyCount;
Buffer.BlockCopy(Data, 0, _tableEntropy, _entropyCount, count);
_entropyCount += count;
if (_entropyCount == 128)
{
byte[] data = new byte[64];
byte[] tableSeed = new byte[48];
Buffer.BlockCopy(_tableEntropy, 0, data, 0, 64);
byte[] entropy = ComputeHash64(data);
Buffer.BlockCopy(entropy, 0, tableSeed, 0, 32);
Buffer.BlockCopy(_tableEntropy, 64, data, 0, 64);
entropy = ComputeHash64(data);
Buffer.BlockCopy(entropy, 0, tableSeed, 32, 16);
_expansionTable = ExpandTable(tableSeed);
_entropyCount = 0;
}
}

Nothing too complicated here.. the bytes are copied to a class variable_tableEntropy, and when the bytes number 128, the two halves of that array are passed to SHA256, and their returns build the 48 byte generator key, which is then used to rebuild or 'rotate' the expansion table.

So let's summarize an analysis of our little xor cipher thus far:

The bad guy, gets a hold of a 10Mib message, and from that he can deduce:

Nothing. He doesn't know if the random was extracted from the first seed in the key, or the last, the first byte of random derived from that seed, or the last byte. He doesn't know the when, what, or how an unpredictable internal state machine has swapped out keys and counters. He doesn't even know what type of file it was!

That 10Mib file would require 100 seeds to build, so the attacker would have to guess: AES(key + counter) 2 * 100 times to read that file, -but, you also have to add in the internal random state resets. With a modulo of 256 for the seedcounters, it seems to reset about 100 times when generating a 102400 byte block. So if the state auto-resets 100 times per 102400 bytes, and a hundred different seeds are used to encrypt that 10Mib file, that would be 10,000 internal state resets.. so, unwinding the random data would require that you **bre****ak **AES more then 10,000 **times**.. not to mention it's actually the **xor **from ***2* different **AES transforms**, **and he's not even examining the random itself, he'd actually be looking at the random after a binary xor with the message data.. This is also amplified by the size of the file itself, 100Mib file, 10,000 seeds, 1000Mib file, 100,000 seeds (1 million times!!!).. Keep in mind, that any computer attempting to 'solve' the random, would also have to reproduce the exact conditions of this algorithm; building the table, key and counter expansions, passes through AES and SHA256, this would involve thousands of processer cycles per iteration.

Actually I had time to test the ratios of internal resets to differential today (avg. over 1000 runs, each with a unique seed and expansion table seed):

##### Avg. Differential/Internal State Changes per 102400 block

- 1024 : 17.6
- 512 : 43
- 256 : 94
- 128 : 191
- 64 : 388
- 32 : 770

Now you might think, but if you could *guess* 2 keys/counters from a seeded block, you could then get the hashes and construct the next key/counter and roll through the block that way.. but remember the expansion table? Ahh.. without the 48 byte key used to build that randomizing table, there is no way you could get the correct hash value for the next key and counter, see how it all ties together?

Now I know what the next question is, 'yeah, but what about throughput?' Granted, the resolution is turned up pretty high, but AES is very fast.. I can create 100 Mib of random with CSRG using this resolution in about 24 seconds on my amd quad-core, add the file writes and one additional xor and you are still looking at about 200 Mib per minute, very reasonable considering the kind of security this could provide. But I may turn in down, add a 0 to the build size and bring up the reseed modulus; (the 'Differential' property in the class), to 512, which would probably near double the throughput, (I'll have to test it, and weigh the tradeoffs..).

If the data produced by this process is sufficiently random, (and all the statistical tests indicate that it is), the computational power required to unwind the random, (which would be the necessary step in decrypting any data xored with that random), would involve such an enormous number of calculations, as to be well beyond the capabilities of computing systems, now, and for the forseeable future. The ideas I am setting forth here are essentially a recipe for creating an encryptor that in all practical terms, can create unbreakable encryption. A One Time Pad created using deterministic machines..

So far, this is as good an xor cipher as I can build, but I know it could be improved upon, and I definately welcome any advice on making it better, (I see SHA as the 'weak link' in this implementation, with only 2 table runs on 64 bytes, block and finalizer.. and the expansion methods could do with a little more whitening). So if you know a better way, a way in which these methods, or anything else could be improved upon, do speak up, I'm listening..

So, why so much security, do I think AES is broken? Doubt it, wouldn't use it if I did.. Do I think that there may be methods by which a brute force attack against AES can be 'steered' somehow? Maybe, and if not, you can bet someone is working on it right now. As it stands though, brute forcing a 256 bit key, particularly if a mode like CBC is used, is virtually impossible.. but some people think quantum computing may change all that..

What I set out to do here, was create the engine for a One Time Pad implementation. Remove any chance that a brute force attack could unwind the random, and, if there are discreet patterns introduced into the output using an AES counter implementation, to mitigate those patterns by creating a short period between state changes. I also felt that making the state changes erratic would reduce any pattern visibility, and combining 2 transforms, with the overlap between key exchanges, and resulting loss in byte information, would make any patterns even less distinct.

This was not my first implementation, I also tried xoring AES transforms with Camellia and Twofish, but both have elaborate key expansion routines, so it really drags, this was the best in keeping with the 'Must be fast' criteria. I left those experiments in the class LokiOld.cs in case you want to play with it yourself.

## CSRG API

**Constructors:**

public CSRG()

Loads the default table seed.

public CSRG(byte[] TableSeed)

Loads user defined table seed. Must be 48 bytes of random.

**Properties:**

public EngineModes Engine

Select the engine type: Dual or Single AES CTRs.

public int OutputMaximum

Maximum output from a single seed in bytes.

public int SeedSize

Required Seed buffer size.

public Differentials StateDifferential

Changes the chaotic reseed differential. Smaller values mean more aggressive reseeding.

**Methods**

public byte[] Generate(Int32 Size, byte[] Seed)

Generate a block of random bytes using property defined parameters.

Size: Size of data return in bytes.

Seed: Random seed, Fixed size: must be 64 bytes long.

Returns: data byte[]

public byte[] Generate(Int32 Size, byte[] Seed, EngineModes Engine)

Generate a block of random bytes.

Size: Size of data return in bytes.

Seed: Random seed, Fixed size: must be 64 bytes long.

Engine: The AES engine type, single CTR, or Dual xored CTRs.

Returns: data byte[]

public byte[] Generate(Int32 Size, byte[] Seed, EngineModes Engine, Differentials Differential)

Generate a block of random bytes.

Size: Size of data return in bytes.

Seed: Random seed, Fixed size: must be 64 bytes long.

Engine: The AES engine type, single CTR, or Dual xored CTRs.

Differential: The reseed modulus differential. Smaller values mean more reseeds.

Returns: data byte[]

## SHA256

SHA256 is now completely inline with the CSRG class as an optimized method that only hashes on blocksize boundaries, (64 bytes). This was one of my design criteria, no Microsoft crypto Api in the class. It's still SHA256, a rewrite of the bouncycastle implementation, but the internal method ComputeHash64(), is now an internal method with a fixed input size. I did however, include the full class with the project, SHA256Digest.

## SHA256Digest API

The SHA256Digest class is a rewrite of the bouncycastle class, making it a stand alone class, with some optimizations and refactoring. The output was tested against the criteria in the Secure Hash Standard Documentation. I added a ComputeHash method to align it with the windows implementation, and a Dispose mechanism for the class. A more compact method with a fixed input size of 64 bytes is contained in the CSRG.cs class. SHA256Digest class is in the 'Experiments' folder of this project.

public void BlockUpdate(byte[] Input, int InputOffset, int Length)

Used if doing generating a hash for a file or large output streams. Call DoFinal to retrieve the hash value once all of the data has been processed. Input blocks passed to this method must be block aligned, exactly 64 bytes in length.

Input -bytes to process.

InputOffset -starting position within Input.

Length -length of data to process.

public byte[] ComputeHash(byte[] Input)

Single call method (as used in the windows implementation). Input the bytes for the hash input.

Input -bytes to process.

Returns -byte[] value hash.

public Int32 DoFinal(byte[] Output, Int32 OutputOffset)

When using the BlockUpdate() method, this is the last method to call after the data is processed, and can contain any size as the Output. The Output is then changed to the hash value itself.

Output -last block of data to process, becomes the hash return in bytes.

OutputOffset - The starting position within the Output array.

Returns -bytes processed.

public void Reset()

Resets the internal state of the algorithm.

public void Update(byte Input)

Updates the message digest with a single byte.

Input -one byte of data to process.

public void Dispose()

Disposes of internal variables. Called when class is to be destroyed.

## Internal AES Implementation

Advanced Encryption Standard (AES) methods in the CSRG.cs class are a one way encryptor, with a fixed key size of 256 bits. It has been optimized to perform only the minimum number of calculations necessary given those parameters. It is based on a well known implementation used in the mono library, which in turn I believe was created using the Novell implementation. The two main methods are Transform, which encrypts a 16 byte block of data, and ExpandKey, which expands a 32 byte key to a 60 member UInt32 array. The output was tested against the specification tests contained in the AES Standard document published by NIST, (a good read if you want to understand AES better).

private void Transform(byte[] InData, byte[] OutData, UInt32[] Key)

private UInt32[] ExpandKey(byte[] Key)

## NIST Statistical Test Suite

Well, what can I say, I just got tired of working with the command console and doing one test at a time, so I made the NIST Statistical Test Suite into a library; added the exports, remmed the fprints and wrote a couple of default methods in utilities, (but I didn't change anything computational!). So now, you can run this as a C library using PInvoke. All the api is in the class NistSTS.cs, and it's all well commented.

The original NIST Statistical Test Suite console application is available here, the manual is here. Many thanks to the authors of this great tool, the code inside that application is very impressive, and well worth a look..

The NIST test suite is one of the most authoritative test suites for assesing the entropy from psuedo random generators, as far as statistical testing goes, it's very thorough, and well regarded in the cryptogrphic community, (it was sanctioned by NIST, and they define the American standards for cryptography). So, good deal, (that was a lot of work!). I built a nifty little test bed, it can compare CSGR to RngCrypto, the bouncycastle CTRSP800DRBG, (which I converted from java to C#, and made the fewest possible changes to make it work, mostly just remming TDEA stuff). It can also get true random data from random.org so you can establish a baseline, (but there is a 1MB limit of free random per day!). Apparently they get their data from radio towers picking up atmospheric noise.

So, if you run the tests, you can see CSRG passes consistantly, and P-Values are all in good margins, (most of them are good at near 0.5, except the 2 frequency tests are better at 0.6-0.8, the longest run your aiming for around 0.2, and approximate entropy should be a bit low as well, say 0.1-0.3 is best). I added exerpts from the manual to the help file, so you can get a better sense of input recommendations and result expectations, if you want more then that, read the manual. You'll notice a lot of p-value fluctuations (from all the generators, even random.org), that's normal, you have to run many thousands of tests before you can establish trends, (just remember; SUCCESS=good, FAIL, bad..). Some of the tests should only be run on large samples, (1Mib or more), or they will throw an OUT OF RANGE exception, that's why those tests are disabled until you choose a larger size.

Except for the Linear Complexity tests, (which have no pass score and are always rated N/A), you'll see a SUCCESS or FAIL for each test and the calculated p-values. Now it's normal to see the occasional FAIL, you see that even with raw random data. Also keep in mind, the larger the sample size, the more values have time to converge on their statistical averages, so bigger builds give a more accurate read on a generators output entropy levels. If you see a significant amount of failures, or groupings of failures, or a particular test fails consistantly, well, then its back to the drawing board ;o)

I've noticed all AES CTR based generators seem to have some bias in certain tests, (just as SHA/G, BBS and every other generator do in other tests), and the more tests you perform, the more the trends become apparent in the outcome, signaling the weaknesses in a particular algorithm. Overall, I'd have to say CSRG tests are coming out very good, at least as good as any CTR generator I've tested, and that it meets or exceeds all of my criteria for the project, so I'll probably end up using some variant of it for my xor cipher..

## NIST STS API

**Initialize library and load a file**

public static extern int NistLoadFile([MarshalAs(UnmanagedType.LPStr)]String lpFileName, int Size);

**Cleanup library resources**

public static extern int NistCleanup();

Always the last call. Must be called after last test is run to release memory used by the library.

**(From the manual..)**

**Frequency (Monobit) Test**

public static extern double NistFrequency();

**Test Purpose**

The focus of the test is the proportion of zeroes and ones for the entire sequence. The purpose of this test is to determine whether the number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence. The test assesses the closeness of the fraction of ones to ½, that is, the number of ones and zeroes in a sequence should be about the same. All subsequent tests depend on the passing of this test.

**Interpretation of Results**

Since the P-value obtained in step 3 of Section 2.1.4 is ≥ 0.01 (i.e., P-value = 0.527089), the conclusion is that the sequence is random. Note that if the P-value were small (< 0.01), then this would be caused by Sn or sobs being large. Large positive values of Sn are indicative of too many ones, and large negative values of Sn are indicative of too many zeroes.

**Input Size Recommendation**

It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100).

*Returns*: P Value.

**Frequency Test within a Block**

public static extern double NistBlockFrequency(int M);

**Test Purpose**

The focus of the test is the proportion of ones within M-bit blocks. The purpose of this test is to determine whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness. For block size M=1, this test degenerates to test 1, the Frequency (Monobit) test.

**Interpretation of Results**

Since the P-value obtained in step 4 of Section 2.2.4 is ≥ 0.01 (i.e., P-value = 0.801252), the conclusion is that the sequence is random. Note that small P-values (< 0.01) would have indicated a large deviation from the equal proportion of ones and zeros in at least one of the blocks.

**Input Size Recommendation**

It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100). Note that n ≥ MN. The block size M should be selected such that M ≥ 20, M > .01n and N < 100.

*Params ***M**: block size, default is 128.

**Returns**: P Value.

**Cumulative Sums (Cusum) Test**

public static extern double NistCumulativeSums();

**Test Purpose**

The focus of this test is the maximal excursion (from zero) of the random walk defined by the cumulative sum of adjusted (-1, +1) digits in the sequence. The purpose of the test is to determine whether the cumulative sum of the partial sequences occurring in the tested sequence is too large or too small relative to the expected behavior of that cumulative sum for random sequences. This cumulative sum may be considered as a random walk. For a random sequence, the excursions of the random walk should be near zero. For certain types of non-random sequences, the excursions of this random walk from zero will be large.

**Interpretation of Results**

Since the P-value obtained in step 4 of Section 2.13.4 is ≥ 0.01 (P-value = 0.411658), the conclusion is that the sequence is random. Note that when mode = 0, large values of this statistic indicate that there are either “too many ones” or “too many zeros” at the early stages of the sequence; when mode = 1, large values of this statistic indicate that there are either “too many ones” or “too many zeros” at the late stages. Small values of the statistic would indicate that ones and zeros are intermixed too evenly.

**Input Size Recommendation**

It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100).

*Returns*: P Value.

**Runs Test**

public static extern double NistRuns();

**Test Purpose**

The focus of this test is the total number of runs in the sequence, where a run is an uninterrupted sequence of identical bits. A run of length k consists of exactly k identical bits and is bounded before and after with a bit of the opposite value. The purpose of the runs test is to determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. In particular, this test determines whether the oscillation between such zeroes and ones is too fast or too slow.

**Interpretation of Results**

Since the P-value obtained in step 4 of Section 2.3.4 is ≥ 0.01 (i.e., P-value = 0.147232), the conclusion is that the sequence is random. Note that a large value for Vn(obs) would have indicated an oscillation in the string which is too fast; a small value would have indicated that the oscillation is too slow. (An oscillation is considered to be a change from a one to a zero or vice versa.) A fast oscillation occurs when there are a lot of changes, e.g., 010101010 oscillates with every bit. A stream with a slow oscillation has fewer runs than would be expected in a random sequence, e.g., a sequence containing 100 ones, followed by 73 zeroes, followed by 127 ones (a total of 300 bits) would have only three runs, whereas 150 runs would be expected.

**Input Size Recommendation**

It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100).

*Returns*: P Value.

#### Longest Run of Ones in a Block

public static extern double NistLongestRunOfOnes();

##### Test Purpose

The focus of the test is the longest run of ones within M-bit blocks. The purpose of this test is to determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence. Note that an irregularity in the expected length of the longest run of ones implies that there is also an irregularity in the expected length of the longest run of zeroes. Therefore, only a test for ones is necessary.

##### Conclusion and Interpretation of Results

For the example in Section 2.4.8, since the P-value ≥ 0.01 (P-value = 0.180609), the conclusion is that the sequence is random. Note that large values of χ2(obs) indicate that the tested sequence has clusters of ones.

##### Input Size Recommendation

It is recommended that each sequence to be tested consists of a minimum of bits as specified in the table in Section 2.4.2.

*Returns*: P Value.

#### Binary Matrix Rank Test

public static extern double NistRank();

The focus of the test is the rank of disjoint sub-matrices of the entire sequence. The purpose of this test is to check for linear dependence among fixed length substrings of the original sequence. Note that this test also appears in the DIEHARD battery of tests.

##### Interpretation of Results

Since the P-value obtained in step 5 of Section 2.5.4 is ≥ 0.01 (P-value = 0.741948), the conclusion is that the sequence is random. Note that large values of χ 2 ( obs ) (and hence, small P-values) would have indicated a deviation of the rank distribution from that corresponding to a random sequence.

##### Input Size Recommendation

The probabilities for M = Q = 32 have been calculated and inserted into the test code. Other choices of M and Q may be selected, but the probabilities would need to be calculated. The minimum number of bits to be tested must be such that n ≥ 38MQ (i.e., at least 38 matrices are created). For M = Q = 32, each sequence to be tested should consist of a minimum of 38,912 bits.

**Returns**: P Value.

#### Discrete Fourier Transform (Spectral) Test

public static extern double NistDiscreteFourierTransform();

##### Test Purpose

The focus of this test is the peak heights in the Discrete Fourier Transform of the sequence. The purpose of this test is to detect periodic features (i.e., repetitive patterns that are near each other) in the tested sequence that would indicate a deviation from the assumption of randomness. The intention is to detect whether the number of peaks exceeding the 95 % threshold is significantly different than 5 %.

##### Interpretation of Results

Since the P-value obtained in step 8 of Section 2.6.4 is ≥ 0.01 (P-value = 0.029523), the conclusion is that the sequence is random. A d value that is too low would indicate that there were too few peaks (< 95%) below T, and too many peaks (more than 5%) above T.

##### Input Size Recommendation

It is recommended that each sequence to be tested consist of a minimum of 1000 bits (i.e., n ≥ 1000).

**Returns**: P Value.

Non Overlapping Template Test

public static extern double NistNonOverlappingTemplateMatchings(int M);

##### Test Purpose

The focus of this test is the number of occurrences of pre-specified target strings. The purpose of this test is to detect generators that produce too many occurrences of a given non-periodic (aperiodic) pattern. For this test and for the Overlapping Template Matching test of Section 2.8, an m-bit window is used to search for a specific m-bit pattern. If the pattern is not found, the window slides one bit position. If the pattern is found, the window is reset to the bit after the found pattern, and the search resumes.

##### Interpretation of Results

Since the P-value obtained in step 5 of Section 2.7.4 is ≥ 0.01 (P-value = 0.344154), the conclusion is that the sequence is random. If the P-value is very small (< 0.01), then the sequence has irregular occurrences of the possible template patterns.

##### Input Size Recommendation

The test code has been written to provide templates for m = 2, 3,…,10. It is recommended that m=9 or m = 10 be specified to obtain meaningful results. Although N=8 has been specified in the test code, the code may be altered to other sizes. However, N should be chosen such that N ≤ 100 to be assured that the P-values are valid. Additionally, be sure that M > 0.01 • n and N=n/M.

**Params** **M**: Indice of template contained in \templates folder. Ex. template8 = 8. Default is 9.

**Returns**: P Value.

Overlapping Template Test

public static extern double NistOverlappingTemplateMatchings(int M);

**Test Purpose **

The focus of the Overlapping Template Matching test is the number of occurrences of pre-specified target strings. Both this test and the Non-overlapping Template Matching test of Section 2.7 use an *m*-bit window to search for a specific *m*-bit pattern. As with the test in Section 2.7, if the pattern is *not *found, the window slides one bit position. The difference between this test and the test in Section 2.7 is that when the pattern *is *found, the window slides only one bit before resuming the search.

**Interpretation of Results **

Since the *P-value *obtained in step 4 of Section 2.8.4 is ≥ 0.01 (*P-value *= 0.274932), the conclusion is that the sequence is random. Note that for the 2-bit template (*B *= 11), if the entire sequence had too many 2-bit runs of ones, then: 1) ν5 would have been too large, 2) the test statistic would be too large, 3) the *P-value *would have been small (< 0.01) and 4) a conclusion of non-randomness would have resulted.

**Input Size Recommendation **

The values of *K*, *M *and *N *have been chosen such that each sequence to be tested consists of a minimum of 10^{6 }bits (i.e., *n *≥ *10*^{6}). Various values of *m *may be selected, but for the time being, NIST recommends *m=9 *or *m = 10. *If other values are desired, please choose these values as follows:

*n *≥ *MN*.

*N *should be chosen so that *N *• *(min *π*i**) > 5. *

λ *= (M-m+1)/2*^{m }≈ *2 *

*m *should be chosen so that *m *≈ *log**2 **M *

Choose *K *so that *K *≈ *2*λ. Note that the π*i *values would need to be recalculated for values of *K *other than 5.

*Params* **M**: Bit pattern length. Default is 9.

**Returns**: P Value.

#### Maurer’s “Universal Statistical” Test

public static extern double NistUniversal();

##### Test Purpose

The focus of this test is the number of bits between matching patterns (a measure that is related to the length of a compressed sequence). The purpose of the test is to detect whether or not the sequence can be significantly compressed without loss of information. A significantly compressible sequence is considered to be non-random.

##### Interpretation of Results

Since the P-value obtained in step 5 of Section 2.9.4 is ≥ 0.01 (P-value = 0.767189), the conclusion is that the sequence is random. Theoretical expected values for ϕ have been computed as shown in the table in step (5) of Section 2.9.4. If fn differs significantly from expectedValue(L), then the sequence is significantly compressible.

##### Input Size Recommendation

This test requires a long sequence of bits (n ≥ (Q + K)L) which are divided into two segments of L-bit blocks, where L should be chosen so that 6 ≤ L ≤ 16. The first segment consists of Q initialization blocks, where Q should be chosen so that Q = 10 • 2L. The second segment consists of K test blocks, where K= n/L-Q ≈ 1000 • 2L.

**Returns**: P Value.

#### Approximate Entropy Test

public static extern double NistApproximateEntropy(int M);

##### Test Purpose

As with the Serial test of Section 2.11, the focus of this test is the frequency of all possible overlapping m-bit patterns across the entire sequence. The purpose of the test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence.

##### Interpretation of Results

Since the P-value obtained in step 7 of Section 2.12.4 is ≥ 0.01 (P-value = 0.261961), the conclusion is that the sequence is random. Note that small values of ApEn(m) would imply strong regularity. Large values would imply substantial fluctuation or irregularity.

##### Input Size Recommendation

Choose m and n such that m<log2 n-5.

**Params ****M**: -block size, default is 10.

**Returns**: P Value.

Random Excursions Test

public static extern double NistRandomExcursions();

##### Test Purpose

The focus of this test is the number of cycles having exactly K visits in a cumulative sum random walk. The cumulative sum random walk is derived from partial sums after the (0,1) sequence is transferred to the appropriate (-1, +1) sequence. A cycle of a random walk consists of a sequence of steps of unit length taken at random that begin at and return to the origin. The purpose of this test is to determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence. This test is actually a series of eight tests (and conclusions), one test and conclusion for each of the states: -4, -3, -2, -1 and +1, +2, +3, +4.

##### Interpretation of Results

Since the P-value obtained in step 8 of Section 2.14.4 is ≥ 0.01 (P-value = 0.502529), the conclusion is that the sequence is random. Note that if χ2(obs) were too large, then the sequence would have displayed a deviation from the theoretical distribution for a given state across all cycles.

##### Input Size Recommendation

It is recommended that each sequence to be tested consist of a minimum of 1,000,000 bits (i.e., n ≥ 106).

**Returns**: P Value.

Random Excursions Variant Test

public static extern double NistRandomExcursionsVariant();

##### Test Purpose

The focus of this test is the total number of times that a particular state is visited (i.e., occurs) in a cumulative sum random walk. The purpose of this test is to detect deviations from the expected number of visits to various states in the random walk. This test is actually a series of eighteen tests (and conclusions), one test and conclusion for each of the states: -9, -8, …, -1 and +1, +2, …, +9.

##### Interpretation of Results

If the P-value is ≥ 0.01, the conclusion is that the sequence is random.

##### Input Size Recommendation

It is recommended that each sequence to be tested consist of a minimum of 1,000,000 bits (i.e., n ≥ 106).

**Returns**: P Value.

Linear Complexity Test

public static extern double NistLinearComplexity(int M);

##### Test Purpose

The focus of this test is the length of a linear feedback shift register (LFSR). The purpose of this test is to determine whether or not the sequence is complex enough to be considered random. Random sequences are characterized by longer LFSRs. An LFSR that is too short implies non-randomness.

##### Interpretation of Results

If the P-value were < 0.01, this would have indicated that the observed frequency counts of Ti stored in the νI bins varied from the expected values; it is expected that the distribution of the frequency of the Ti (in the νI bins) should be proportional to the computed πi .

##### Input Size Recommendation

Choose n ≥ 106. The value of M must be in the range 500≤ M ≤ 5000, and N ≥ 200 for the χ2 result to be valid.

*Params *M: block size, default is 512.

**Returns**: P Value.

Serial Test

public static extern double NistSerial(int M);

The focus of this test is the frequency of all possible overlapping m-bit patterns across the entire sequence. The purpose of this test is to determine whether the number of occurrences of the 2mm-bit overlapping patterns is approximately the same as would be expected for a random sequence. Random sequences have uniformity; that is, every m-bit pattern has the same chance of appearing as every other m-bit pattern. Note that for m = 1, the Serial test is equivalent to the Frequency test of Section 2.1.

##### Interpretation of Results

Since the P-value obtained in step 5 of Section 2.11.4 is ≥ 0.01 (P-value1 = 0.808792 and P-value2 = 0.670320), the conclusion is that the sequence is random. Note that if ∇2ψ2m or ∇ψ2m had been large then non-uniformity of the m-bit blocks is implied.

##### Input Size Recommendation

Choose m and n such that m< log2 n -2.

*Params* M: block size, default is 16.

**Returns**: P Value.

## The Tests

To test a DRBG, you need something to compare it with to get an idea of what good numbers really are. Well, we have RNGCrypto, but in C# anyways, not much else.. There is an outline proposed in the NIST document 80090A, from which several C and java versions emerged. Bouncycastle's java version includes the class CTRSP800DRBG, which I have re-made into a C# version for these tests, (now you have 2 Drbg's in C#!).

Both the CTRSP800 and RngCrypto produce excellent random scores in all the tests I have run in Diehard, Ent, and the Nist Statistical Test Suite, so they all appear to be good random generators, (keep in mind though, that in order to properly guage which algorithm delivers the best entropy, you have to test many thousands of samples in order to smooth out small fluctuations). I have run the CSRG output through the Nist's tests many thousands of times, using various sizes from 10Kib to 100Mib, and it consistantly passes with excellent results. Given the criteria I have for my project, the best choice is CSRG, because of the complexity of the restate mechanism, and the inherent unpredictability of internal state changes, it would be nearly impossible to unwind, while still returning excellent random output at good speed. I'm still testing it, but that's ramping down now, as the results remain constant and within metrics I established for the output tests.

Now for the inevitable caveat: though I researched this all carefully, and it is built upon an accepted method of producing cryptographically secure psuedo random, (AES CTR), I am not a cryptographer, so use at your own risk. It takes a lot of testing before any new algorithm becomes accepted into general use, keep that in mind, test it yourself, figure out how it works, and then make an informed choice.

I approached this project not as a cryptographer, but as an engineer, and as such broke it down into a series of logical problems to solve, and I think the result is a clever design, and something with potential. Anyways, send me an email if you have any ideas about improving the design, or a comment if you have a question..

cheers

## Updates

What's with all the freaking updates, John? Ok, ok! I think I'm finally done now.. have fun (July 20)

July 14 - Renamed 'Loki' algorithm to CSRG, cleaned up the code.

July 19 - Added new test harness with NIST STS library included, added a new test harness, and a second generator method with expanded property and method options.

July 20 - Expanded article with api definitions, removed old content. Expanded the help file to include CSRG api definitions. Probably the last update..

July 24 - Added entropy pool and expansion table rotation methods. Increased DUAL seed size to 96 bytes. Upped recommended block size to 100Kib.