65.9K
CodeProject is changing. Read more.
Home

Project CipherTheme. Strong Ciphers with Extended Cryptographic Key

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.80/5 (6 votes)

May 17, 2016

CPOL

11 min read

viewsIcon

12064

downloadIcon

305

Remove the artificial encryption key length limitation by using the power of the extended cryptographic key concept with parallel programming.

The following project is due to my long-time enchantment with encryption. It is inspired by the excellent work of John Underhill (http://www.codeproject.com/Articles/828477/Cipher-EX-V) - and presents the effort of producing the working model for several encryption algorithms using multiple processor cores mode.

The aim here is not to compete with professional mathematicians devising the cryptographic functions but to remove the artificial encryption key length limitation and to demonstrate the power of the extended cryptographic key concept and its ease of use.

On a philosophical note, I think that the exploration of opportunities and surpassing the limitations form the real spirit of development, be it in programming or other disciplines.

I offer a stream cipher based on the Skein1024 hash algorithm and a couple of new block ciphers with extended key length – one applying the byte rotation scheme and the other one based on XXTEA providing for the flexible block size (in multiples of 8) of user choice, from 24 bits to 1024 (as opposed, for example, to fixed 16 bit block size for AES or Serpent etc…).

The transformation could be presented as T(x[i],key[i])
where x[i] is the transformed element - int, long or byte with i running from 0 to the number of elements in the block (4 integers in the 16 bytes block)
and key[i] is the corresponding key to be used at the transformation.

With the itroduction of the rounds into processing it turns to

for (int r = 1;r<=Rounds;r++) {
T(x[i],KS(r,key[i]));
}

Where function KS modifies key[i] for each round (key schedule).

The idea behind the key extension with Skein1024 (or other SHA algorithm) is to replace the working of a weak key schedule function KS with the strong algorithm to initially generate the high enthropy key array for use in the transformation processing. Generally, such an array length would be equal to (Block Size) * Rounds.

For instance Twirl, algorithm of 32 bytes block size and 10 rounds works with 32*8*10=2560 bits key compared to the standard 256 bits AES key with the weak extension function.

Take 64 bytes block and 40 rounds and get a 64*8*40=20480 bits encryption.
1024 bytes block and 100 rounds will provide for the ridiculous but feasible 819200 bits key.
The only practical limit to the key strength is the time we are willing to spend on processing.

The CipherTheme program is the Windows C# application including the minimalistic (but working) interface combined with the set of business classes.

Happy encrypting!

1. Skein Stream Cipher

Having such a magnificent SHA as Skein1024 at hand it is hard to refrain from creating the stream cipher out of it, so here it is - fast enought comparing to other algorithms and powerful as Skein1024 allows (and it allows for a lot).

Skein1024 class under SHA3 folder of the project is the empowering force under the whole application. It enables the generation of the byte array of required size – the plain source length in the case of a stream cipher or the necessary extended key length as required by the algorithm and number of rounds in the case of a block cipher.

The point of interest in the Skein stream cipher is the usage of the multiprocessor core in the Transform function. The source file turned to a bytes array and divided between processors (8 in my case), each part being processed in parallel with its own Skein1024 stream generated by the Generate function. Then the streams are XORed in the spirit of a stream cipher with the corresponding part of the source and concatenated into the output array. After the parallel processing, the source array reminder is processed separately and added to the result.

private void Transform(byte[] Input, byte[] Output) {
            if (_processorCount == 1 || Input.Length < _processorCount * BLOCK_SIZE) {
                Skein1024 generator = new Skein1024();
                generator.Initialize(_key, _iv);
                Generate(generator, Output.Length, Output);                
                // output is input xor with random               
                int rem = Output.Length % XORSIZE;
                int aln = Output.Length - rem;

                if (aln != 0) // Output.Length % 16 = 0 for XORBLK to work 
                    XORBLK(Input, 0, Output, 0, aln);

                // reminder bytes after XORBLK
                if (rem != 0) {
                    for (int i = aln; i < Output.Length; i++)
                        Output[i] ^= Input[i];
                }
            }
            else {
                int inpSize = Input.Length;
                int partSize = inpSize / _processorCount; // each processor part - must be divisible by 16 for //XORBLK
                if (partSize % XORSIZE != 0)
                    partSize -= partSize % XORSIZE;

                int inpRem = inpSize % (partSize * _processorCount); // Input reminder after multiprocessors //parts
                // create random, and xor to output in parallel
                // create a generator for each proccessor
                Skein1024[] generators = new Skein1024[_processorCount];

                System.Threading.Tasks.Parallel.For(0, _processorCount, i => {
                    // initialize generators
                    generators[i] = new Skein1024();
                   
                    generators[i].Initialize(_key, vectors[i]);

                    Generate(generators[i], partSize, Output, (i * partSize));
                    // xor with input at offset
                    XORBLK(Input, i * partSize, Output, i * partSize, partSize);
                });
                // reminder after parallel processing
                if (inpRem != 0) {
                    Generate(generators[0],inpRem, Output, inpSize - inpRem);
                    int rem = inpRem % XORSIZE;
                    int aln = inpRem - rem;
                    XORBLK(Input, inpSize - inpRem, Output, inpSize - inpRem, aln);

                    // reminder bytes after XORBLK
                    for (int i = inpSize - rem; i < inpSize; ++i)
                        Output[i] ^= Input[i];
                }
            }
        }

Note the creation of vectors in the Skein constructor at multiprocessing case.

Vectors are initialized from the user provided IV array and distinguished by the first bit (not an extremely powerful technique but sufficient for the demo purpose). This allows for the initialization of 8 (by the processors number) different Skein1024 generators used in the parallel processing. Thus, we have the source file being encrypted with 8 different parallel stream pads, as well as, the overall execution speed boost compared to a single processor (in practice not 8 times faster, rather 5-6).

public Skein(byte[] key, byte[] iv) {
            _key = key;           
            _processorCount = Environment.ProcessorCount;
            if (_processorCount % 2 != 0)
                _processorCount--;
            // create vectors array and tweak vectors if multiprocessing
            if (_processorCount > 1) {
                vectors = new byte[_processorCount][];
                for (int i = 0; i < _processorCount; i++) {
                    vectors[i] = new byte[iv.Length];
                    Buffer.BlockCopy(iv, 0, vectors[i], 0, iv.Length);
                    vectors[i][0] = (byte)i;
                }
            }
            else
                _iv = iv;    
        }

Decryption routine follows the same pattern.

Test run of Skein stream cipher encryption routine shows the following result:

  • Intel i7 2.00 GHz machine with 8 processors and 64-bit Windows7 OS
  • File: 129676 KB
  • Time: 16 seconds

Application usage:

  1. Select cipher from the drop-down list - Skein (Stream) in this first part
  2. Provide the mandatory Password and IV entries and hit "Init" button Status bar will show the cipher Name after the initialization
  3. Select source file. The target file will show as the source one with added underscore ("_"). Naturally the target file path can be manually edited.
  4. Encrypt/Decrypt buttons will start the process of encryption/decryption of the source into the target with the timing at the status bar. Exceptions are handled in a generic manner just appearing as status bar messages.

As a bonus for those bored with the unattended file encryption process, I present the opportunity to encrypt the text on the spot with the “Encrypt” button and immediately enjoy the merry chaos of the encrypted array string representation. Fortunately, the chaos can be decrypted with the trusted “Decrypt” button.

As soon as Skein (Stream) initialized, and only in this case, – the "Text" button appears and the Text Form can be called to life.

The form presents “Plain Text” and “Cipher Text” text boxes. Plain text will be encrypted and put into the Cipher Text box with the “Encrypt” button. Cipher text will be decrypted and put into the Plain Text box with the “Decrypt” button. Skein stream cipher empowers the transformation.

2. Byte rotation Swirl Cipher with the expanded key schedule and parallel processing

This original cipher belongs to the block ciphers family. For those who remember the famous Rubik's Cube, it kind of imitates the rotation of a 3*3*3 cube with the cells being the bytes of the cipher stream. Thus the cipher's block size equals 27 bytes.

Rubik's Cube

The encryption transformation happens here not with the integers or longs as the case is with the regular block ciphers but on the atomic byte level. Each byte of the encrypted array is the product of the certain byte of the initial array and the pre-generated encryption key. Naturally the Swirl key is a byte array as opposed to a uint (ulong) array for the regular case. As in the previous part, the powerful Skein1024 generator creates the required extended key.

The Swirl encryption algorithm takes 9 blocks of 3 bytes in the first dimension (say, horizontal) and transforms them. Then it works on the second and third dimensions which gives the total of 27 atomic transformations thus requiring 27 unique keys per round. The algorithm easily lends itself to loop implementation but sets in the code in a flat manner for the sake of the execution speed.

The Swirl class inherits from the BlockBaseCipher class that provides for the public interface and general block processing – parallel or single depending on the machine processor. The input byte array is divided in to the number of arrays which are handled in parallel and copied into the output array. Then the reminder of the input after the parallel block processing is transformed and added to the output.

Here’s the code sample for the byte array parallel encryption:

protected byte[] parallelEncrypt(byte[] input, int outLen) {
            int inpSize = input.Length;
            int blocks = inpSize / _ParallelBlock;
            int partsize = blocks * _Blocksize;
            int rem = inpSize % _ParallelBlock;
            byte[] output = new byte[outLen];
            // divide input into _ProcessorCount number of inputss (8) and fill them from the input
            byte[][] inputs = new byte[_ProcessorCount][];
            // create _ProcessorCount number of outputs (8)
            byte[][] outputs = new byte[_ProcessorCount][];
            System.Threading.Tasks.Parallel.For(0, _ProcessorCount, i => {
                inputs[i] = new byte[partsize];
                outputs[i] = new byte[partsize];
                Buffer.BlockCopy(input, partsize * i, inputs[i], 0, partsize);
                ProcessEncrypt(inputs[i], outputs[i]);
                Buffer.BlockCopy(outputs[i], 0, output, partsize * i, partsize);
            });           
            //add the reminder
            if (rem != 0) {
                byte[] remInput = new byte[rem];
                Buffer.BlockCopy(input, partsize * _ProcessorCount, remInput, 0, rem);
                byte[] remOutput = encrypt(remInput);              
                Buffer.BlockCopy(remOutput, 0, output, partsize * _ProcessorCount, remOutput.Length);
            }
            return output;
        }

By design, every block cipher processing involves the sequence of the single block transformations and BlockBaseCipher calls encryptBlock (decryptBlock) for the specific implementation in the derived class. That’s where the particular encryption algorithm shows its strength (or weakness). BlockBaseCipher also provides for the generic padding handling of the cipher stream last block.

3. XXTEA-based Twirl Cipher with the expanded key schedule, variable block size and parallel processing

Let us take such a simple and efficient encryption algorithm as an XXTEA and modify it applying the power of the extended key method. It appears also that this algorithm easily lends itself to the idea of a variable block size set by user at the cipher initialization.

Here's the original XXTEA implementation in C:

#define DELTA 0x9e3779b9
  #define MX (((z>>5^y<<2) + (y>>3^z<<4)) ^ ((sum^y) + (key[(p&3)^e] ^ z)))
  
  void btea(uint32_t *v, int n, uint32_t const key[4]) {
    uint32_t y, z, sum;
    unsigned p, rounds, e;
    if (n > 1) {          // Coding Part 
      rounds = 6 + 52/n;
      sum = 0;
      z = v[n-1];
      do {
        sum += DELTA;
        e = (sum >> 2) & 3;
        for (p=0; p<n-1; p++) {
          y = v[p+1]; 
          z = v[p] += MX;
        }
        y = v[0];
        z = v[n-1] += MX;
      } while (--rounds);
    } else if (n < -1) {  // Decoding Part
      n = -n;
      rounds = 6 + 52/n;
      sum = rounds*DELTA;
      y = v[0];
      do {
        e = (sum >> 2) & 3;
        for (p=n-1; p>0; p--) {
          z = v[p-1];
          y = v[p] -= MX;
        }
        z = v[n-1];
        y = v[0] -= MX;
        sum -= DELTA;
      } while (--rounds);
    }
  }

and its C# modified equivalent for encryption part:

   protected override void encryptBlock(byte[] buffer) { 
           int k = 0;
           ulong mx = 0;
           ulong[] v = new ulong[4];

           for (int i = 0; i < 4; i++) {
               v[i] = BytesToBe64(buffer, 8*i );
           }

           ulong y;
           ulong z = v[3];

           for (int r = 0; r < Rounds; r++) {
               for (int p = 0; p < 3; p++) {
                   y = v[p + 1];
                   mx = (z >> 5 ^ y << 2) + (y >> 3 ^ z << 4) ^ (sum ^ y) + (key[k++] ^ z);
                   z = v[p] += mx;
               }
               y = v[0];
               mx = (z >> 5 ^ y << 2) + (y >> 3 ^ z << 4) ^ (sum ^ y) + (key[k++] ^ z);
               z = v[3] += mx;              
           }

           for (int i = 0; i < 4; i++) {
               Be64ToBytes(v[i], buffer, 8*i);
           }  
        }

First thing to notice here is the creation of the v array from the input buffer:

ulong[] v = new ulong[4];
for (int i = 0; i < 4; i++) {
               v[i] = BytesToBe64(buffer, 4*i );
           }

The supposed buffer length here equals 32 bytes, which provides for the creation of 4 ulong numbers to be used in the block ecryption cycle. Nothing prevents us from using an arbitrary block size (in multiples of 8) in order to get N ulong numbers from a block of N*8 bytes.

Secondly, notice the use of sum += DELTA; and e = (sum >> 2) & 3; in the original code Ever modified e is used at #define MX ... (key[(p&3)^e] ^ z))) which provides for the necessary key variance during encryption cycle. It is would be a cosiderable enhancement if this key manipulation would be simply changed to the next key[k++] value from the Skein1024 generated sequence.

Another point of improvement is to change the constant shift values like 5,2,3 and 4 from the mx assgnment to variables. Since we have a new key value with every rotation in the loop we can use it to obtaine 4 new numbers as following: mx = z >> (byte)key[k] ^ y << (byte)(key[k] >> 8) ^ y >> (byte)(key[k] >> 16) ^ z << (byte)(key[k] >> 24) ^ key[k++];

All additions in the algorithm were also eliminated for it to contain only shift and xor operations.

The final solution looks like this:

 // encrypt ulongs
        protected override void encryptBlock(byte[] buffer) {
            int k = 0;
            ulong mx = 0;

            ulong[] v = new ulong[_keysNum];

            for (int i = 0; i < _keysNum; i++) {
                v[i] = BytesToBe64(buffer, 8 * i);
            }

            ulong y;
            ulong z = v[_keysNum - 1];

            for (int r = 0; r < _Rounds; r++) {
                for (int i = 0; i < _keysNum - 1; i++) {
                    y = v[i + 1];
                    // use curret key to get 4 shift byte values for y and z
                    mx = z >> (byte)_expKeyLong[k] ^ y << (byte)(_expKeyLong[k] >> 8) ^
                         y >> (byte)(_expKeyLong[k] >> 16) ^ z << (byte)(_expKeyLong[k] >> 24) ^ _expKeyLong[k++];
                    z = v[i] ^= mx;
                }
                y = v[0];
                mx = z >> (byte)_expKeyLong[k] ^ y << (byte)(_expKeyLong[k] >> 8) ^
                         y >> (byte)(_expKeyLong[k] >> 16) ^ z << (byte)(_expKeyLong[k] >> 24) ^ _expKeyLong[k++];
                z = v[_keysNum - 1] ^= mx;
            }

            for (int i = 0; i < _keysNum; i++) {
                Be64ToBytes(v[i], buffer, 8 * i);
            }
        }

where _keysNum gives the number of ulongs in the block.

So, the XXTEA, the Tiny Encryption Algorithm, disappeared. The newly created Twirl is heavily dependent on the extended key to be preliminary created at initialization. This new quality provides for the high enthropy of the key schedule. The possile large block size (up to 1024 bytes in this implementation) leads to the encryption of up to 1024*8 = 8192 bits on a single round of processing. The standard 64 bytes (8 numbers) block with 20 rounds commands the 64*8*20 = 10240 bits encryption.

Since Twirl is inherited from BlockBaseCipher it enjoys all the parallel procennig abilities of the base class. Combined with a reasonable number of the processing rounds all theese qualities provide for the interesting and fast encryption algorithm.

4. Different ciphers processing speed chart

Test Machine: i7-4770 3.4 GHz, 8GB RAM, Windows 7, 64 bit

Test File: 129 676 KB

Presented here are the measurements of the average time spent on the above mentioned test file transformation by ciphers described in the previous parts of the article and other algorithms that I had a pleasure to implement. The encryption and decryption times are practically the same, so only the ecryption data are on display.

It should be noted that block ciphers are used in two stages. Firstly they are initialized and the extended key is generated. Then they perform the processing itself. The timing shown here does not include the initialization process and applies to the second stage only. Therefore it would not be correct to compare block and stream ciphers' speed, since with the stream ones the key generation happens as needed during the encryption process.

Block Ciphers

Cipher Block Size (byte) Rounds Encryption Key Strength (bit) File Encryption Time (sec) 100 MB Encryption Time (sec)
AES 16 18 256 2.4 1.9
ThreeFish 64 72 320 1.4 1.1
Serpent 16 48 3136 10.4 8.2
Serpent 32 48 6272 5.2 4.1
Swirl 27 3 1944 1.7 1.34
Swirl 27 9 5832 5.2 4.1
Twirl 64 10 5120 1.0 0.79
Twirl 64 40 20480 3.9 3.0

Stream Ciphers

Cipher File Encryption Time (sec) 100MB Encryption Time (sec)
ChaCha 1.6 1.26
Skein 14.3 11.295
Keccak 10.1 7.98

Conclusion

I strongly believe in the human right to free idea exchanges, combined with the right to privacy and protection of private information. Nothing serves this purpose better than the powerful, reliable, and accessible encryption techniques, a small part of which I humbly tried to display and promote in this article.

For centuries people were gathering knowledge that led them to the understanding of the nature of the universe. The information accumulation and exchange form the base of human progress, evolution and culture. The modern way of storing, manipulating, and transforming information provides for knowledge distribution to the public and for private usage to be accessed on demand. Nowadays, with the advance of accessible computing devices and cryptographic methods, the stable and speedy digital flow of information also becomes a protected and reliable process, allowing the freedom of encryption for everybody.

Guiding Publications