Project CipherTheme. Strong Ciphers with Extended Cryptographic Key






4.80/5 (6 votes)
Remove the artificial encryption key length limitation by using the power of the extended cryptographic key concept with parallel programming.
The following project is due to my long-time enchantment with encryption. It is inspired by the excellent work of John Underhill (http://www.codeproject.com/Articles/828477/Cipher-EX-V) - and presents the effort of producing the working model for several encryption algorithms using multiple processor cores mode.
The aim here is not to compete with professional mathematicians devising the cryptographic functions but to remove the artificial encryption key length limitation and to demonstrate the power of the extended cryptographic key concept and its ease of use.
On a philosophical note, I think that the exploration of opportunities and surpassing the limitations form the real spirit of development, be it in programming or other disciplines.
I offer a stream cipher based on the Skein1024
hash algorithm and a couple of new block ciphers with extended key length – one applying the byte rotation scheme and the other one based on XXTEA providing for the flexible block size (in multiples of 8) of user choice, from 24 bits to 1024 (as opposed, for example, to fixed 16 bit block size for AES or Serpent etc…).
The transformation could be presented as T(x[i],key[i])
where x[i]
is the transformed element - int, long or byte with i running from 0 to the number of elements in the block (4 integers in the 16 bytes block)
and key[i]
is the corresponding key to be used at the transformation.
With the itroduction of the rounds into processing it turns to
for (int r = 1;r<=Rounds;r++) { T(x[i],KS(r,key[i])); }
Where function KS
modifies key[i]
for each round (key schedule).
The idea behind the key extension with Skein1024 (or other SHA algorithm) is to replace the working of a weak key schedule function KS with the strong algorithm to initially generate the high enthropy key array for use in the transformation processing. Generally, such an array length would be equal to (Block Size) * Rounds.
For instance Twirl, algorithm of 32 bytes block size and 10 rounds works with 32*8*10=2560 bits key compared to the standard 256 bits AES key with the weak extension function.
Take 64 bytes block and 40 rounds and get a 64*8*40=20480 bits encryption.
1024 bytes block and 100 rounds will provide for the ridiculous but feasible 819200 bits key.
The only practical limit to the key strength is the time we are willing to spend on processing.
The CipherTheme program is the Windows C# application including the minimalistic (but working) interface combined with the set of business classes.
Happy encrypting!
1. Skein Stream Cipher
Having such a magnificent SHA as Skein1024 at hand it is hard to refrain from creating the stream cipher out of it, so here it is - fast enought comparing to other algorithms and powerful as Skein1024 allows (and it allows for a lot).
Skein1024
class under SHA3 folder of the project is the empowering force under the whole application. It enables the generation of the byte array of required size – the plain source length in the case of a stream cipher or the necessary extended key length as required by the algorithm and number of rounds in the case of a block cipher.
The point of interest in the Skein stream cipher is the usage of the multiprocessor core in the Transform function. The source file turned to a bytes array and divided between processors (8 in my case), each part being processed in parallel with its own Skein1024 stream generated by the Generate
function. Then the streams are XORed in the spirit of a stream cipher with the corresponding part of the source and concatenated into the output array. After the parallel processing, the source array reminder is processed separately and added to the result.
private void Transform(byte[] Input, byte[] Output) {
if (_processorCount == 1 || Input.Length < _processorCount * BLOCK_SIZE) {
Skein1024 generator = new Skein1024();
generator.Initialize(_key, _iv);
Generate(generator, Output.Length, Output);
// output is input xor with random
int rem = Output.Length % XORSIZE;
int aln = Output.Length - rem;
if (aln != 0) // Output.Length % 16 = 0 for XORBLK to work
XORBLK(Input, 0, Output, 0, aln);
// reminder bytes after XORBLK
if (rem != 0) {
for (int i = aln; i < Output.Length; i++)
Output[i] ^= Input[i];
}
}
else {
int inpSize = Input.Length;
int partSize = inpSize / _processorCount; // each processor part - must be divisible by 16 for //XORBLK
if (partSize % XORSIZE != 0)
partSize -= partSize % XORSIZE;
int inpRem = inpSize % (partSize * _processorCount); // Input reminder after multiprocessors //parts
// create random, and xor to output in parallel
// create a generator for each proccessor
Skein1024[] generators = new Skein1024[_processorCount];
System.Threading.Tasks.Parallel.For(0, _processorCount, i => {
// initialize generators
generators[i] = new Skein1024();
generators[i].Initialize(_key, vectors[i]);
Generate(generators[i], partSize, Output, (i * partSize));
// xor with input at offset
XORBLK(Input, i * partSize, Output, i * partSize, partSize);
});
// reminder after parallel processing
if (inpRem != 0) {
Generate(generators[0],inpRem, Output, inpSize - inpRem);
int rem = inpRem % XORSIZE;
int aln = inpRem - rem;
XORBLK(Input, inpSize - inpRem, Output, inpSize - inpRem, aln);
// reminder bytes after XORBLK
for (int i = inpSize - rem; i < inpSize; ++i)
Output[i] ^= Input[i];
}
}
}
Note the creation of vectors in the Skein constructor at multiprocessing case.
Vectors are initialized from the user provided IV array and distinguished by the first bit (not an extremely powerful technique but sufficient for the demo purpose). This allows for the initialization of 8 (by the processors number) different Skein1024 generators used in the parallel processing. Thus, we have the source file being encrypted with 8 different parallel stream pads, as well as, the overall execution speed boost compared to a single processor (in practice not 8 times faster, rather 5-6).
public Skein(byte[] key, byte[] iv) {
_key = key;
_processorCount = Environment.ProcessorCount;
if (_processorCount % 2 != 0)
_processorCount--;
// create vectors array and tweak vectors if multiprocessing
if (_processorCount > 1) {
vectors = new byte[_processorCount][];
for (int i = 0; i < _processorCount; i++) {
vectors[i] = new byte[iv.Length];
Buffer.BlockCopy(iv, 0, vectors[i], 0, iv.Length);
vectors[i][0] = (byte)i;
}
}
else
_iv = iv;
}
Decryption routine follows the same pattern.
Test run of Skein stream cipher encryption routine shows the following result:
- Intel i7 2.00 GHz machine with 8 processors and 64-bit Windows7 OS
- File: 129676 KB
- Time: 16 seconds
Application usage:
- Select cipher from the drop-down list - Skein (Stream) in this first part
- Provide the mandatory Password and IV entries and hit "Init" button Status bar will show the cipher Name after the initialization
- Select source file. The target file will show as the source one with added underscore ("_"). Naturally the target file path can be manually edited.
- Encrypt/Decrypt buttons will start the process of encryption/decryption of the source into the target with the timing at the status bar. Exceptions are handled in a generic manner just appearing as status bar messages.
As a bonus for those bored with the unattended file encryption process, I present the opportunity to encrypt the text on the spot with the “Encrypt” button and immediately enjoy the merry chaos of the encrypted array string representation. Fortunately, the chaos can be decrypted with the trusted “Decrypt” button.
As soon as Skein (Stream) initialized, and only in this case, – the "Text" button appears and the Text Form can be called to life.
The form presents “Plain Text” and “Cipher Text” text boxes. Plain text will be encrypted and put into the Cipher Text box with the “Encrypt” button. Cipher text will be decrypted and put into the Plain Text box with the “Decrypt” button. Skein stream cipher empowers the transformation.
2. Byte rotation Swirl Cipher with the expanded key schedule and parallel processing
This original cipher belongs to the block ciphers family. For those who remember the famous Rubik's Cube, it kind of imitates the rotation of a 3*3*3 cube with the cells being the bytes of the cipher stream. Thus the cipher's block size equals 27 bytes.
Rubik's Cube
The encryption transformation happens here not with the integers or longs as the case is with the regular block ciphers but on the atomic byte level. Each byte of the encrypted array is the product of the certain byte of the initial array and the pre-generated encryption key. Naturally the Swirl key is a byte array as opposed to a uint (ulong) array for the regular case. As in the previous part, the powerful Skein1024
generator creates the required extended key.
The Swirl encryption algorithm takes 9 blocks of 3 bytes in the first dimension (say, horizontal) and transforms them. Then it works on the second and third dimensions which gives the total of 27 atomic transformations thus requiring 27 unique keys per round. The algorithm easily lends itself to loop implementation but sets in the code in a flat manner for the sake of the execution speed.
The Swirl
class inherits from the BlockBaseCipher
class that provides for the public interface and general block processing – parallel or single depending on the machine processor. The input byte array is divided in to the number of arrays which are handled in parallel and copied into the output array. Then the reminder of the input after the parallel block processing is transformed and added to the output.
Here’s the code sample for the byte array parallel encryption:
protected byte[] parallelEncrypt(byte[] input, int outLen) { int inpSize = input.Length; int blocks = inpSize / _ParallelBlock; int partsize = blocks * _Blocksize; int rem = inpSize % _ParallelBlock; byte[] output = new byte[outLen]; // divide input into _ProcessorCount number of inputss (8) and fill them from the input byte[][] inputs = new byte[_ProcessorCount][]; // create _ProcessorCount number of outputs (8) byte[][] outputs = new byte[_ProcessorCount][]; System.Threading.Tasks.Parallel.For(0, _ProcessorCount, i => { inputs[i] = new byte[partsize]; outputs[i] = new byte[partsize]; Buffer.BlockCopy(input, partsize * i, inputs[i], 0, partsize); ProcessEncrypt(inputs[i], outputs[i]); Buffer.BlockCopy(outputs[i], 0, output, partsize * i, partsize); }); //add the reminder if (rem != 0) { byte[] remInput = new byte[rem]; Buffer.BlockCopy(input, partsize * _ProcessorCount, remInput, 0, rem); byte[] remOutput = encrypt(remInput); Buffer.BlockCopy(remOutput, 0, output, partsize * _ProcessorCount, remOutput.Length); } return output; }
By design, every block cipher processing involves the sequence of the single block transformations and BlockBaseCipher
calls encryptBlock (decryptBlock)
for the specific implementation in the derived class. That’s where the particular encryption algorithm shows its strength (or weakness). BlockBaseCipher
also provides for the generic padding handling of the cipher stream last block.
3. XXTEA-based Twirl Cipher with the expanded key schedule, variable block size and parallel processing
Let us take such a simple and efficient encryption algorithm as an XXTEA and modify it applying the power of the extended key method. It appears also that this algorithm easily lends itself to the idea of a variable block size set by user at the cipher initialization.
Here's the original XXTEA implementation in C:
#define DELTA 0x9e3779b9 #define MX (((z>>5^y<<2) + (y>>3^z<<4)) ^ ((sum^y) + (key[(p&3)^e] ^ z))) void btea(uint32_t *v, int n, uint32_t const key[4]) { uint32_t y, z, sum; unsigned p, rounds, e; if (n > 1) { // Coding Part rounds = 6 + 52/n; sum = 0; z = v[n-1]; do { sum += DELTA; e = (sum >> 2) & 3; for (p=0; p<n-1; p++) { y = v[p+1]; z = v[p] += MX; } y = v[0]; z = v[n-1] += MX; } while (--rounds); } else if (n < -1) { // Decoding Part n = -n; rounds = 6 + 52/n; sum = rounds*DELTA; y = v[0]; do { e = (sum >> 2) & 3; for (p=n-1; p>0; p--) { z = v[p-1]; y = v[p] -= MX; } z = v[n-1]; y = v[0] -= MX; sum -= DELTA; } while (--rounds); } }
and its C# modified equivalent for encryption part:
protected override void encryptBlock(byte[] buffer) { int k = 0; ulong mx = 0; ulong[] v = new ulong[4]; for (int i = 0; i < 4; i++) { v[i] = BytesToBe64(buffer, 8*i ); } ulong y; ulong z = v[3]; for (int r = 0; r < Rounds; r++) { for (int p = 0; p < 3; p++) { y = v[p + 1]; mx = (z >> 5 ^ y << 2) + (y >> 3 ^ z << 4) ^ (sum ^ y) + (key[k++] ^ z); z = v[p] += mx; } y = v[0]; mx = (z >> 5 ^ y << 2) + (y >> 3 ^ z << 4) ^ (sum ^ y) + (key[k++] ^ z); z = v[3] += mx; } for (int i = 0; i < 4; i++) { Be64ToBytes(v[i], buffer, 8*i); } }
First thing to notice here is the creation of the v array from the input buffer:
ulong[] v = new ulong[4]; for (int i = 0; i < 4; i++) { v[i] = BytesToBe64(buffer, 4*i ); }
The supposed buffer length here equals 32 bytes, which provides for the creation of 4 ulong numbers to be used in the block ecryption cycle. Nothing prevents us from using an arbitrary block size (in multiples of 8) in order to get N ulong numbers from a block of N*8 bytes.
Secondly, notice the use of sum += DELTA;
and e = (sum >> 2) & 3;
in the original code Ever modified e is used at #define MX ... (key[(p&3)^e] ^ z)))
which provides for the necessary key variance during encryption cycle. It is would be a cosiderable enhancement if this key manipulation would be simply changed to the next key[k++]
value from the Skein1024 generated sequence.
Another point of improvement is to change the constant shift values like 5,2,3 and 4 from the mx assgnment to variables. Since we have a new key value with every rotation in the loop we can use it to obtaine 4 new numbers as following: mx = z >> (byte)key[k] ^ y << (byte)(key[k] >> 8) ^ y >> (byte)(key[k] >> 16) ^ z << (byte)(key[k] >> 24) ^ key[k++];
All additions in the algorithm were also eliminated for it to contain only shift and xor operations.
The final solution looks like this:
// encrypt ulongs protected override void encryptBlock(byte[] buffer) { int k = 0; ulong mx = 0; ulong[] v = new ulong[_keysNum]; for (int i = 0; i < _keysNum; i++) { v[i] = BytesToBe64(buffer, 8 * i); } ulong y; ulong z = v[_keysNum - 1]; for (int r = 0; r < _Rounds; r++) { for (int i = 0; i < _keysNum - 1; i++) { y = v[i + 1]; // use curret key to get 4 shift byte values for y and z mx = z >> (byte)_expKeyLong[k] ^ y << (byte)(_expKeyLong[k] >> 8) ^ y >> (byte)(_expKeyLong[k] >> 16) ^ z << (byte)(_expKeyLong[k] >> 24) ^ _expKeyLong[k++]; z = v[i] ^= mx; } y = v[0]; mx = z >> (byte)_expKeyLong[k] ^ y << (byte)(_expKeyLong[k] >> 8) ^ y >> (byte)(_expKeyLong[k] >> 16) ^ z << (byte)(_expKeyLong[k] >> 24) ^ _expKeyLong[k++]; z = v[_keysNum - 1] ^= mx; } for (int i = 0; i < _keysNum; i++) { Be64ToBytes(v[i], buffer, 8 * i); } }
where _keysNum
gives the number of ulongs in the block.
So, the XXTEA, the Tiny Encryption Algorithm, disappeared. The newly created Twirl is heavily dependent on the extended key to be preliminary created at initialization. This new quality provides for the high enthropy of the key schedule. The possile large block size (up to 1024 bytes in this implementation) leads to the encryption of up to 1024*8 = 8192 bits on a single round of processing. The standard 64 bytes (8 numbers) block with 20 rounds commands the 64*8*20 = 10240 bits encryption.
Since Twirl is inherited from BlockBaseCipher
it enjoys all the parallel procennig abilities of the base class. Combined with a reasonable number of the processing rounds all theese qualities provide for the interesting and fast encryption algorithm.
4. Different ciphers processing speed chart
Test Machine: i7-4770 3.4 GHz, 8GB RAM, Windows 7, 64 bit
Test File: 129 676 KB
Presented here are the measurements of the average time spent on the above mentioned test file transformation by ciphers described in the previous parts of the article and other algorithms that I had a pleasure to implement. The encryption and decryption times are practically the same, so only the ecryption data are on display.
It should be noted that block ciphers are used in two stages. Firstly they are initialized and the extended key is generated. Then they perform the processing itself. The timing shown here does not include the initialization process and applies to the second stage only. Therefore it would not be correct to compare block and stream ciphers' speed, since with the stream ones the key generation happens as needed during the encryption process.
Block Ciphers
Cipher | Block Size (byte) | Rounds | Encryption Key Strength (bit) | File Encryption Time (sec) | 100 MB Encryption Time (sec) |
AES | 16 | 18 | 256 | 2.4 | 1.9 |
ThreeFish | 64 | 72 | 320 | 1.4 | 1.1 |
Serpent | 16 | 48 | 3136 | 10.4 | 8.2 |
Serpent | 32 | 48 | 6272 | 5.2 | 4.1 |
Swirl | 27 | 3 | 1944 | 1.7 | 1.34 |
Swirl | 27 | 9 | 5832 | 5.2 | 4.1 |
Twirl | 64 | 10 | 5120 | 1.0 | 0.79 |
Twirl | 64 | 40 | 20480 | 3.9 | 3.0 |
Stream Ciphers
Cipher | File Encryption Time (sec) | 100MB Encryption Time (sec) |
ChaCha | 1.6 | 1.26 |
Skein | 14.3 | 11.295 |
Keccak | 10.1 | 7.98 |
Conclusion
I strongly believe in the human right to free idea exchanges, combined with the right to privacy and protection of private information. Nothing serves this purpose better than the powerful, reliable, and accessible encryption techniques, a small part of which I humbly tried to display and promote in this article.
For centuries people were gathering knowledge that led them to the understanding of the nature of the universe. The information accumulation and exchange form the base of human progress, evolution and culture. The modern way of storing, manipulating, and transforming information provides for knowledge distribution to the public and for private usage to be accessed on demand. Nowadays, with the advance of accessible computing devices and cryptographic methods, the stable and speedy digital flow of information also becomes a protected and reliable process, allowing the freedom of encryption for everybody.
Guiding Publications
- The Skein Hash Function Family Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting, Mihir Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker
- Matthew D. Russell - Tinyness: An Overview of TEA and Related Ciphers