1. Abstract

The age of information is also the age of digital information assets, where the professional programmer has to deal with cryptography. This article presents the theory, source code, and implementation for variable key size RSA encryption/decryption, digital signing, multi precision library, Diffie-Hellman key exchange, entropy collection, pseudo random number generator, and more. The article presents how to implement your own secure protocol using the IOCP technology, by presenting a secure chat client/server solution implementation.

2. Requirements

The article expects the reader to be familiar with C++, TCP/IP, socket programming, MFC, and multithreading.
Some mathematic knowledge about elementary number theory and statistics is required.
The source code uses Winsock 2.0 and IOCP technology, and requires:
- Windows NT/2000 or later: Requires Windows NT 3.5 or later.
- Windows 95/98/ME: Not supported.
- Visual C++ .NET or a fully updated Visual C++ 6.0.

3. Introduction

The age of information is also the age of digital information assets. The professional developer has to deal with cryptography to make data storage and transmission secure. The purpose of this article is not to “reinvent the wheel” or implement home made, mathematically-unsafe cryptographic algorithms. This article focuses on the practical details concerning cryptographically-safe protocols, and presents the theory and source code for a secure client/server solution that can be used for any type of client/server application.

To know cryptography in theory is essential for a developer, but to implement it in practice is difficult. There are many security exploits as buffer overflow [1] and others that arise with the implementation of theoretically secure algorithms. Many commercial/free high quality cryptography libraries exist in the market, as Crypto++ [2], OpenSSL [3], and Crypttool [4]. To use these libraries, the developer has to know the cryptography theories behind the implementation, and also be aware of “what is happening under the hood of the library”. This is not an easy task, because the internal structure of these libraries can be complex, and the libraries contain unnecessary functionality that is not always needed.

This article briefly explains cryptographic theories involving cryptographically-safe communication protocols, and also presents how this is implemented by providing a secure chat client/server solution.

4. Background

This section explains some of the cryptographic theories and terminology needed to understand the rest of the article, namely section 5. If you think that you have enough experience and knowledge and want to get your hands dirty, please continue to the next section (5. Implementation).

By using cryptographic algorithms, data storage and transmission can be performed in a secure way. Using algorithms to change the data in such a way that only an authorized recipient is able to reconstruct the data is called encryption. For an unauthorized recipient, the encrypted data looks like a meaningless and random sequence of bits. A cryptographic algorithm, also known as cipher, is a mathematical function which uses plain text (the data) as the input and produces cipher text (the encrypted data) as the output (and vice versa). A secret key is used together with the cipher to encrypt the plain text, and the same key or another key is used to decrypt the cipher text back to plain text.

Different cryptographic algorithms or ciphers have different mathematical properties and weaknesses. The details of a cipher is usually made public. It is in the secret key that the security of a modern cipher lies, not in the details of the cipher. To get additional information and knowledge about security, please read and try Crypttool [4]. The Crypttool software demonstrates several encryption attack implementations, and gives you additional information about the algorithms used in this article.

4.1. Symmetric Cryptography

In Symmetric Cryptography, the sender and recipient use the same secret key to encrypt and decrypt plain text. This means that the sender and recipient must be in possession of a common (secret) key which they have exchanged before actually starting to communicate.

+

The advantage of symmetric algorithms is the high speed with which data can be encrypted and decrypted.

-

One disadvantage is the need for key management. In order to communicate with each other confidentially, the sender and the recipient must have exchanged a key using a secure channel, before actually starting to communicate.

Most modern symmetric algorithms operate on blocks of plain text. The procedures usually consist of many complex rounds of bit shifts and transformations to gain protection against different mathematical analysis attacks. In section 5, we will choose symmetric cryptography based on its properties and cryptographic strength.

4.2. Asymmetric Cryptography

Unlike asymmetric encryption, each subscriber has a personal pair of keys consisting of a secret key and a public key. The public key is public, this means that anyone can have it. The keys are mathematically related, yet it is computationally difficult to deduce one from the other. Using the public key, anyone can encrypt the plain text, and only the one that has the private key can decrypt the message.

Using asymmetric encryption, two entities on a network (let’s call them Alice and Bob) can communicate securely using the following simple protocol:

Bob and Alice exchange public keys.
Alice encrypts her message with Bob's public key and sends it to Bob.
Bob encrypts his message with Alice's public key and sends it to Alice.
Bob decrypts Alice's message with his private key, and Alice decrypts Bob’s message with her private key.

Now, they can communicate because they know their private keys, and no one else can decrypt the message because they do not have Alice and Bob's private keys. This is not entirely true because of the “man in the middle attack” and practical reasons that we will discuss in section 4.3 and 4.5.

+

The advantage is that no secure channel is needed before messages are transmitted, because all the information required in order to communicate confidentially can be sent openly.

-

The disadvantage is that pure asymmetric procedures take a lot longer to perform than symmetric ones. Therefore, asymmetric encryption is used to exchange a session key used with symmetric cryptography.

The most well-known asymmetric procedure is the RSA algorithm [can be found in 5, 10], named after its developers Ronald Rivest, Adi Shamir, and Leonard Adleman. The RSA algorithm was published in 1978. The RSA algorithm can be used for both public key encryption and digital signatures. Its security is based on the difficulty of factoring large integers. We will discuss the RSA encryption algorithm and its implementation in section 4.6.

4.3. Man-in-the-Middle Attack

The public asymmetric encryption secure communication protocol between Alice and Bob described above in section 4.2 is vulnerable to a man-in-the-middle attack.

Let's assume that Mallory is an enemy hacker and can:

listen to the traffic between Alice and Bob.
can modify, delete, and substitute Alice's and Bob's messages, and also introduce new ones.

Mallory can impersonate Alice when talking to Bob, and impersonate Bob when talking to Alice. This is possible in a real life communication over the Internet.

An example of the attacker:

Bob sends Alice his public key. Mallory intercepts the key and sends his own public key to Alice.
Alice generates a random session key, encrypts it with Bob’s public key (which is really Mallory's), and sends it to Bob.
Mallory intercepts the message. He decrypts the session key with his private key, encrypts it with Bob's public key, and sends it to Bob.
Bob receives the message thinking it came from Alice. He decrypts it with his private key, and obtains the session key.

Alice and Bob start exchanging messages using the session key. Mallory, who also has the session key, can now decipher the entire conversation. This can, of course, be solved by using One-Way Hash functions (section 4.4) to digitally sign the message package (section 4.5 and 5.3).

4.4. Message Digest

A message digest, also known as a one-way hash function, is a mathematical function that takes a variable-length input and converts it into a fixed-length binary sequence called the hash value. Another important property of a message digest is that it is hard to reverse the process. Given the hash value, it is hard or impossible to find the input. Furthermore, a good hash function also makes it hard to find two different data inputs that produce the same hash value. Even a slight change in an input data causes the hash value to change drastically, therefore, a message digest is used to check the integrity/correctness of the digital data.

This makes a one-way hash function a central notion in public-key cryptography, and is used when producing a digital signature (section 4.5 and 5.3) for a message. Message digests are also used to distribute randomness. For example, a 8 bit true random number is consumed by a message digest and becomes 160 bits, where the randomness is distributed among the 160 bits. The most popular one-way hash algorithms are MD4 and MD5 (both producing a 128-bit hash value), and SHA, also known as SHA1 (producing a 160-bit hash value).

4.5. Digital Signing

The main concept of digital signing is that the receiver can verify that a document or message is from a certain person or company. Digital signing is made using asymmetric cryptography (usually RSA) and message digest to sign and verify a message, as described in the procedure below. For this to be possible, we need a strong asymmetric public key (n,e) that we trust. If you use your private key to encrypt a message, anyone could decrypt it using your public key. What good is that? Well, it proves you are the one that encrypted it and it has not been altered since you did so. This forms the basis of the digital signature.

Normally, an independent third party that everyone trusts, whose responsibility is to issue certificates, is called a Certification Authority (CA). A certificate is a data package that completely identifies an entity, and is issued by a CA only after that authority has verified the entity's identity. The certificate can hold this asymmetric public key that everybody trusts.

In this article, we hard-code the trusted asymmetric public key (n,e) into the client software (also discussed in 5.6.3), and are not using other parties. We will discuss this later in section 5.3.4. The procedure of digital signing is described below:

4.5.1. Digital signing assumption

The signer A possesses the private key (d,e) and the public key (n,e). The recipient B already trusts the public key of A (n,e).

4.5.2. The digital signature procedure

Create a message digest or hash of the information to be sent, that we call m (assuming m<n).
The signer A uses the private key to compute the signature (s = m^d mod n).
Send this signature s and the message to the recipient B.

4.5.3. The signature verification procedure

Recipient B does the following:

Uses sender A's public key (n, e) to compute an integer (v = s^e mod n).
Independently computes the message digest of the information that has been signed.
If both message digests are identical, the signature is valid.

The implementation of this can be found in MyCryptLib::DemoDSA(..) in the source code provided.

4.6. The RSA Algorithm

In this section, we briefly describe how the RSA algorithm works, we will discuss the implementation later. The RSA asymmetric algorithm [can be found in 5] was named after its developers, Ronald Rivest, Adi Shamir, and Leonard Adleman. Almost all known commercial secure protocols depend on RSA public key exchange. The strength in RSA lies in the difficulty to factor two very large prime numbers, p and q.

4.7. The RSA Procedure

The security of the RSA algorithm depends, as with all public key methods, on the difficulty to calculate the private key, d, from the public key, (n, e). This is obtained by making it difficult to factorize the product of two prime numbers (p,q) from the public key n (described below, section 4.7.1). Therefore, the prime numbers (p,q) must be very large.

4.7.1. Standard RSA algorithm description

n = pq where p and q are distinct primes.
phi, φ = (p-1)(q-1)
e < n such that gcd(e, phi)=1, usually e=3. 
d = e^-1 mod phi, d is the private key.
Encryption: 
c = m^e mod n, 1<m<n.
Decryption:
m = c^d mod n.

This means that we need to operate with large integers, and also generate big prime numbers. We will discuss the implementation of this algorithm later, in section 5.3.

4.8. Random Numbers - The Backbone of all Cryptographic Security

Random numbers are used to generate keys, prime numbers, etc. Therefore, random numbers or random bits are the backbones of all cryptographic security. If the numbers are not truly random, the security can easily be broken. The fact that computers are deterministic (a computing machine) and are unable to produce or handle true randomness, is the central problem. In this section, we discuss what randomness is, and also how computers generate pseudo random numbers.

4.8.1. Entropy, the mathematical name for statistical randomness

Statistics tells us a few things about random bits. Zeros ought to occur as often as ones. There are statistical tests, such as the chi-squared test, that you can perform on a stream of numbers to show how random they are. The degree to which a stream of bits follows this statistical form is the degree to which it is said to be entropic. This is the strict mathematical definition.

4.8.2. Pseudo-random number generators

A pseudo random number generator is an algorithm that generates a sequence of numbers that acts like random numbers, but they are not. The algorithm starts with a “seed”, a number to start with, and then uses that seed to generate a series of numbers that “act” as if they were random.

Usually, all pseudo random number generator have a period. This means that, after a while, they start to follow a certain pattern. And there are other “bad” properties as:

Shorter than expected periods for some seed states (the period is dependent on the seed value)
Poor dimensional distribution
Mutually dependent values
Some bits being 'more random' than others
Lack of uniformity (everything is not equally probable)

In late 1999, the ASF software Texas Hold’em poker application used the standard rand() to shuffle the cards. It was discovered by Cigital [12], that only five cards needed to be known to predict the remaining cards. The pseudo-random number generator algorithm used to generate random values is very important because of the aspects that we have discussed here. In this implementation, we use the Mersenne twister pseudorandom number generator developed in 1997 by Makoto Matsumoto and Takuji Nishimura, more details of this later, in section 5.

4.8.3. Collecting entropy

Collecting truly random numbers is hard, but is needed. There exist hardware devices that generate true noise, observing cosmic ray flux, etc. However, we need to use the equipments that we have, which are the computers and the users. Instructing the user to move the mouse around the screen in a random manner (as in RandDialog in the source code) or type some random letters, is a good way to generate data, but it takes too much time, and often we do not have a user (for servers). Using the existing hardware in the computer that has some entropy (randomness) is also another way.

The clock in the computer is a quite good source of entropy, finer resolution clocks are better. But if we are using, for example, TickCount() to get a random seed, a search will find it quickly. There are only 31.5 million ticks in a year. This is when we are using different parts of the hardware.
At the Crypto '94 conference, Davis, Ihaka, and Fenstermach showed that air turbulence inside a disk drive creates enough randomness to make cryptographic-strength random numbers.
Use a microphone. Reading the microphone that is built into many computers gives a source of apparently random data.

Once we have collected random data, we have to distribute the randomness, or distill it in a nice way so all bits are equally random. The best way to do this is to use some sort of message digest (discussed in section 4.4).

4.8.4. Summary

Symmetric Cryptography	When the same key is used for encryption/decryption. Symmetric ciphers are fast, but the problem is that the communicating entities need to exchange the key.
Asymmetric Cryptography	A public key is used to encrypt data, and another secret key is used to decrypt data. The algorithm used for this is mainly RSA, and it is based on the difficulty to factorize big prime numbers. It is hard to derive the secret key from the private key. The communication can be done in open, but is not safe for “man in the middle attacks”. Asymmetric encryption is slow, and is only used to exchange keys.
Man-in-the-Middle Attack	It is possible to listen, modify, and delete data packages between two entities in a network. Therefore, it is possible for a hacker to impersonate others in a network and be the “Man-in-the-Middle”. The solution to this is digital signing.
Message Digest	Is a very important one-way function that produces a hash value of a fixed size, given data of variable size. This is used to digitally sign data, compute checksum of data, protect plain text passwords, and also distribute the randomness (entropy) in data.
Digital Signing	Can be based on the RSA algorithm. Is a way to determine that the data really comes from the sender (eliminates Man-in-the-Middle attacks).
Random Numbers - the backbone of all cryptographic security	Random numbers are used to generate prime numbers, keys, etc. Therefore, they play an important role in cryptography. It is hard and time consuming to generate true random numbers. And often, generating them needs special equipment. Seeded pseudo random number generators with good statistical properties can be used instead. The seed can be collected with much entropy using different sources, and be distilled with a message digest.

5. Implementation

In this section, we describe the steps to implement a secure protocol. The reason for the chosen methods, the algorithms, and their reliability and performance are discussed. Furthermore, the source code is briefly explained and presented.

5.1. The Protocol Design – General Overview

To implement a secure protocol, we need to have a symmetric cipher (section 4.1) and a key exchange procedure using asymmetric encryption (section 4.2).

Figure 1. The figure represents the dependencies between the classes involved in the implementation.

The protocol is implemented using three classes namely MyCryptLib, CRijndael, IOCPS/SecureChatIOCP (see figure 1). The MyCryptLib class contains the source code for all the details around the key exchange procedure and digital signing, CRijndael is the symmetric cipher class, and IOCPS/SecureChatIOCP is a comprehensive IOCP server class [6]. Read more about the IOCP technology in my other article “A simple IOCP Server/Client Class” [6]. The details around the implementation are given in this section.

5.2. The Symmetric Cipher

Rijndael was selected as the Advanced Encryption Standard (2000-10-02) [7]. The algorithm is designed with the state of the art in cryptographic research. The algorithm is fast, and is resistant against cryptographic attacks as linear crypt analysis, differential crypt analysis, etc. However, the Rijndael algorithm can be written as a system of multivariate quadratic equations. To solve these equations to obtain the key or plain text is difficult, because there are no efficient algorithms for solving such systems. There is, however, a lot of research around this area, and many methods such as XL and XLS has been presented to solve big multivariate quadratic equation systems [8]. Therefore, 128-bit Rijndael may not be safe anymore. There are also proposals for algorithms that might break the 256 bit Rijndael, but this is not certain [8]. For now, the (2005) 256-bit Rijndael is sufficiently secure for our task, but, this might not be in the future.

The symmetric 256-bit Rijndael cipher is implemented using the class CRijndael. The cipher is used with a 256-bit key, and encrypts/decrypts blocks of size 16 bytes.

Figure 2: The figure shows how data is encapsulated inside an encrypted package. The package size denotes the size of the package. The size of the data inside denotes the real size of the data. If the data does not fit into 16 byte blocks, the data is padded with random data. Also, a CRC16 checksum is added to the block for more security.

The data is packaged into a number of blocks (figure 2), and if the data does not fit the blocks, it is padded with random numbers. A CRC16 checksum is also added to the block to avoid an attacker changing the encrypted data, and also to determine if the decryption was successful. The implementation of the encryption and decryption is done in the functions SecureChatIOCP::EnCryptBuffer(..) and SecureChatIOCP::DeCryptBuffer(..).

5.2.1. Implementation source code

For each package, the Cipher Block Chaining (CBC) is used to encrypt the data. In CBC mode, an encrypted block is obtained by first XORing the data block with the previous encrypted block and then encrypting the resulting value. This means that we need to enter a 16 byte block along with a key to encrypt/decrypt. Each connection/client has an instance of the class CRijndael inside its context, this class is initialized with the key and the initial chain block, with the function:

// key - The 128/192/256-bit user-key to use. 
// chain - initial chain block for CBC and CFB modes.
// keylength - 16, 24 or 32 bytes
// blockSize - The block size in bytes
//             of this Rijndael (16, 24 or 32 bytes).

void CRijndael ::MakeKey(char const* key, char const* chain, 
     int keylength=DEFAULT_BLOCK_SIZE, int blockSize=DEFAULT_BLOCK_SIZE)

{
    ….
}

The functions EnCryptBuffer(CIOCPBuffer *pBuff, ClientContext *pContext) and DeCryptBuffer(CIOCPBuffer *pBuff, ClientContext *pContext) in the class SecureChatIOCP are used to encrypt/decrypt buffers, as in figure 2. For more information and details, please see the source code.

5.3. Asymmetric Encryption – The RSA/DSA Implementation

In sections 4.5-4.8, we discussed what RSA/DSA is, and we know that to use asymmetric encryption, we need to perform computations with very large numbers. This is, however, not possible since the hardware only supports 32 or 64 bit operations. That is why we need to go back to our old, plain computing books to find numerical algorithms to implement computations with arbitrary bit lengths using the existing hardware. This is why we have implemented a multi-precision library. Furthermore, we need to generate very big prime numbers (section 4.6).

5.3.1. Multi-precision library

By using numerical algorithms [9], a set function denoted by BN*(…) (BN stands for Big Number) is implemented in the MyCryptLib class. The focus of the implementation has been simplicity, and not performance. The functions implement all the operations needed for our task, using existing hardware that operates with the type DWORD. The BN*(…) function can compute with numbers with an arbitrary bit length. To know what is happening under the hood is important here, since different implementation exploits can be used to hack the system (discussed in section 2).

The algorithm used to implement the different arithmetic operations as addition, division, and subtraction is complicated, and therefore, we explain only one of them here, namely addition.

Let's start by adding two binary bits. Since each bit has only two possible values, 0 or 1, there are only four possible combinations of inputs. These four possibilities, and the resulting sums, are:

The fourth line indicates that we have an overflow! We need two bits to store the output! To add numbers of two bits can now be performed as in figure 3. The carrier which denotes the overflow in the one bit adder is fed into the adder nr 2 to be summed into S2. By doing this, we can add two bits using two 1 bit adders.

Figure 3. The figure illustrates a two bit adder. A, B are two bit numbers, and S is a two bit number. The count denotes if we have an overflow or not.

In a similar fashion, we can use 32 bit hardware to operate with numbers of arbitrary lengths. Please see the source code below, for addition:

DWORD MyCryptLib::BNAdd(DWORD C[], const DWORD A[], 
                        const DWORD B[], const UINT nSize)
{
    DWORD k=0; // carry 
    for (UINT i = 0; i < nSize; i++) {
        C[i] = A[i] + k;
        if(C[i]>=k)
        // detect overflow and set k
            k=0;
        else k=1;

        C[i] += B[i];
        if (C[i] < B[i])
        // Detect overflow again.
        k++;
    }
    return k;
}

To get additional information about the implementations, please read the source code. And, also check the DemoSimpleTest(..) function in the MyCryptLib class.

5.3.2. Prime number generation

To generate prime numbers, we need the following:
- We need to collect entropy (true randomness) to feed the random generator with.
- A fast pseudo random generator with good statistical properties.
- The last thing that is needed is a fast function that determines if a number is a prime number or not.

Collecting entropy

To collect entropy is not an easy task, as discussed in section 4.8.3. In our application, we collect entropy from the user using the class CRanDialog that collects random data when the user moves the mouse randomly. If the user is not present, the entropy is collected from different hardware components, as described in the source code below:

BOOL MyCryptLib::MTCollectEntropy(BYTE *pRandomPool, UINT nSize)
{
    //…
    while ( nSize-nCollected>0 ) { 
        // Hash the previus entropy Bucket..
        SHA1_Hash(pEntropyBucket,SHA1_DIGEST_SIZE,&csha1);
        // Destill The process ID
        dwRes=GetCurrentProcessId();
        SHA1_Hash((BYTE*)&dwRes,sizeof(DWORD),&csha1);
        // Destill The thread ID
        dwRes=GetCurrentThreadId();
        SHA1_Hash((BYTE*)&dwRes,sizeof(DWORD),&csha1);
        // Destill The system time.
        GetSystemTime(&st);
        SystemTimeToFileTime(&st, &ft);
        SHA1_Hash((BYTE*)&ft,sizeof(FILETIME),&csha1);
        //Destill The processors tickcount.
        dwTick = GetTickCount();
        SHA1_Hash((BYTE*)&dwTick,sizeof(DWORD),&csha1)
        // Destill The memory allocated
        // GlobalMemoryStatus(&ms);
        SHA1_Hash((BYTE*)&ms, sizeof(MEMORYSTATUS),&csha1);
        // Put it inside the Bucket. 
        SHA1_Finish(EntropyBucket,&csha1);
        // Copy the Entropy to the pool
        if ( nSize-nCollected < SHA1_DIGEST_SIZE ) {
          memcpy(pRandomPool+nCollected, 
                 pEntropyBucket,nSize-nCollected);
          nCollected+=nSize-nCollected;
        }
        else {
          memcpy(pRandomPool+nCollected, 
                 pEntropyBucket,SHA1_DIGEST_SIZE);
          nCollected+=SHA1_DIGEST_SIZE;
        }
    }
    return TRUE; 
}

The random generator

The collected entropy (either from the user or the hardware, with MTCollectEntropy(..)) is fed into the Mersenne Twister Random Generator [11].

The generator has the incredible period 2^(219937 – 1). This is a number with 6000 decimal digits. The number of elementary particles in the universe is “only” estimated to be a 80-digit number. The algorithm is also very fast, and has very good statistical properties. The 32-bit random numbers exhibit the best possible equi-distribution properties in dimensions up to 623. Please see the source code DWORD MyCryptLib::MTRandom() and BOOL MyCryptLib::MTInit(BYTE *pRandomPool, UINT nSize) to get additional information.)

Rabin-Miller Probabilistic Primality Test

Rabin Miller [11] is an algorithm that determines if a given number is a prime number or not. The algorithm is probabilistic, this means that it is not 100% secure, but is very fast. We will use this algorithm to find prime numbers as follows:

Start with a random number A with size nSize.
Make it odd.
Use a small list of prime numbers and Rabin-Miller to check if the number is not a prime number.
If the number is not a prime number, add 2 to it, and jump to 3 until we find a prime number.

Please see the functions BNMakePrime(…), BNMakeRSAPrime(..), and RSAGenerateKey(..) to get additional information about prime number generation. As we can see in figure 4, it takes more time to find bigger prime numbers because of two reasons, the computations involved to find prime numbers take longer, and the prime numbers get less dense when they are bigger. It takes about 73 seconds to generate a 4096 bit RSA key on a Centrino laptop.

Figure 4. The figure represents the time it takes to generate an RSA key. The X-axis denotes the size of bits, and the Y-axis denotes the time in seconds.

5.3.3. RSA Encryption/Decryption

The RSA encryption/decryption is discussed in section 4.7. To get more implementation details, please read the MyCryptLib::DemoRSA(..) function. One important aspect discussed here is the performance. The time to encrypt data using different key lengths is between 0.01 ms to 1 ms, depending on the key length.

However, to decrypt a message takes much longer than to encrypt the message (figure 5). The time can be reduced significantly using a different decryption method called Chinese Remainder Theorem (CRT) [10]. This algorithm (the red line in figure 5) is much faster than the standard decryption (m = c^d mod n ) discussed in 4.7 (the blue line in figure 5).

Figure 5. In the figure, we can observe how long the decryption of the RSA algorithm takes for different key lengths. It can be observed that CRT gives much better performance than the standard decryption algorithm (m = c^d mod n).

In the CRT algorithm, the private key is represented as a quintuple (p, q, dP, dQ, and qInv), where p and q are prime factors of n, dP and dQ are known as the CRT exponents, and qInv is the CRT coefficient. The CRT method of decryption is four times faster, overall, than calculating m = c^d mod n (for large key lengths).

The extra values for the private key are:-

dP = (1/e) mod (p-1)
dQ = (1/e) mod (q-1)
qInv = (1/q) mod p where p > q

These are pre-computed and saved along with p and q, as private key. To compute the message m, given c, do the following:-

m1 = c^dP mod p
m2 = c^dQ mod q
h = qInv(m1 - m2) mod p
m = m2 + hq

Even though there are more steps in this procedure, the modular exponentiation to be carried out uses much shorter exponents, and so it is less expensive overall. The implementation of this can be found in MyCryptLib::RSADecryptCRT(…).

5.3.4. Digital signing implementation

The details around digital signing are discussed in section 4.5. According to sections 4.5.2 and 4.5.3, a digital signature is implemented by the functions DigitalSignSHA1rDSA(..) and DigitalVerifySHA1rDSA(..) in the MyCryptLib class. The function uses a SHA1 message digest to compute the checksum of a message, and then sign/verifies it. To get additional information and examples, please see the DemoDSA(..) function inside the MyCryptLib class, or use Server->CryptoLibrary->GenerateKey.

To verify a signature is not CPU-expensive (takes less than 1ms, depending on the key size), but to calculate a signature is very expensive, as shown in figure 6 below. We can observe (figure 6) that signing a message using a key length of 2688 bits takes 1 second. Therefore, it is not possible to sign every message sent to the client using that size.

Figure 6. The figure shows the time, in milliseconds, to compute a signature using the function DigitalSignSHA1rDSA(..) with different key sizes. To sign a message using a key length of 2688 bits takes 1 second.

5.4. Key Exchange Procedure

Whitfield Diffie, Martin E. Hellman, and Ralph Merkle developed the Diffie-Hellman (D-H) key exchange protocol in Stanford in 1976. The protocol protects against “ear dropping”, that means no one can know the secret key that is exchanged, but there is no protection against the “Man in the Middle” attacks discussed in section 4.3. The security of the algorithm lies on the difficulty to solve the discrete logarithm problem, that is: if g= 2, or 5, p is the public prime number, and A=g^a mod(p) is given, calculate the secret random key, a. This is very difficult if p is higher than 1024 bits, however, if the random secret key does not contain enough entropy, the key can be predicted, and then the algorithm is broken.

Description of the algorithm:

The numbers g, p, A, B are public.
Alice generates a secret key, a, and computes A= g^ a mod(p).
Bob generates a secret key, b, and computes B=g^b mod(p).
The keys A, B are exchanged.
Alice computes S=B^a=g^(b*a) mod (p).
Bob computes S=A^b=g^(a*b) mod (p).
Alice and Bob have the same secret key because of the mathematical commutative law that gives g^(b*a) mod (p) = [a*b=b*a] = g^(a*b) mod(p). The implementation and usage of this algorithm can be found in the DemoDiffieHellman(..) function in the MyCrypt class. In the figure below, we can see the computation time needed to exchange the keys according to this algorithm.

Figure 7. The figure shows how long a secure key exchange takes (in milliseconds (y-axis)) compared to the length of the public key, p. We can see that, for a 2048 bit key, it takes 600 ms to exchange the key.

5.5. Key Length

The security of all algorithms presented in the previous articles lies on the key length and the algorithm to generate the keys. A rule of thumb is to have 10%-20% entropy (true randomness) bits of the key size. In 1999, a 512 bits public key was factored with an implementation of the Number Field Sieve algorithm (GNFS), developed by Buhler, Lenstra, and Pomerance. The time it took to break this key was five months. This means that a key of length 512 bits is no longer safe.

This made evident that a module length of 512 bits no longer protects from attackers. Within the last 20 years, a lot of progress has been made. Estimations about the future development of the ability to factor public keys vary, and depend on some assumptions:

Development in computing performance (Moore’s law: every 18 months, the computing power will double) and in grid computing.
Factorizations of prime numbers are part of very many topical research areas in number theory and computer science. The progress is bigger than estimated. This is because of the new knowledge about prime numbers.

In practice, it is still effectively impossible for ordinary people to break keys of size 512, but organizations like NSA with supercomputers can probably break keys. The length of a secure key is typically 1024 (year 2005), and should be increased by 24 bits/year. In figure 5-7, section 5.3, it is shown that the computation time for asymmetric encryption increases exponentially using larger prime numbers or keys.

"Oh, our system uses 4096-bits security”, may sound impressive, but using too large keys decreases the performance, and also gives a false sense of security because of security issues involved in generating the key (10%-20% entropy bits of the key size, the properties of the random number generator, etc.).

In our system, we will use a key size greater than 2048 for digital signing. Digital signing is used to protect from “Man in the Middle” attacks (discussed in sections 4.5.2 and 4.5.3), and is needed to be valid on the server side for a long time, since it is used by the clients to validate the sender. For key exchange procedures (section 5.4), the server accepts keys greater than 1024.

This means that the total key exchange procedure, including digital signing, takes approximately 600ms (100ms from figure 7, section 5.4, plus 500ms from figure 6, section 5.3.4) on a Centrino laptop computer.

5.6. Putting it Together

The server is implemented using IOCP technology [6]. The communication protocol in the server and the client demo is implemented in the SecureChatIOCP class. The class is inherited from IOCPS [6] that uses the IOCP technology. A storage structure (ClientContext found in iocps.h) is associated with each connection through IOCP. All the packages that are received from multiple connections are handled only with one or more threads called IOWorkers. These threads call different functions namely NotifyReceivedPackage(..), NotifyNewConnection(..), etc., and how these functions are implemented define the communicating protocol. Usually, a more sophisticated academic finite state machine model is used to implement complex protocols with IOCP, however, we will keep it as simple as we can. Please read my article “A simple IOCP Server/Client Class” for more information and more details about the IOCP technology.

All the implementation is in the IOCPS class and SecureChatIOCP. By defining _IOCPClientMode_, we change the protocol behavior from the server to the client.

We start of by defining some packages, and how they are handled in NotifyReceivedPackage(..).

5.6.1. Defining the packages

The table below describes the different package structures. A package's first bytes start with the top of the table (yellow).

Package structure 1	Package structure 2	Package structure 3
4 bytes, defining the size of a package	4 bytes, defining the size of a package	4 bytes, defining the size of a package
1 byte defining the package type.	1 byte defining the package type.	1 byte defining the package type.
Variable size, payload, usually contains another encrypted package or a null terminated string.	4 bytes defining the size of the payload (in number of `DWORD`s).	4 bytes defining the size of the string (payload1) in bytes.
	Payload of variable size (usually contains a public key or signature).	Payload1 of variable size. Usually contains a null terminated string (username and password).
		4 bytes defining the size of the string (payload2) in bytes.
		Payload2 of variable size (usually contains a null terminated string).
Table 1: Describes the different package structures that are transferred between the client and the server.

The packages have one of the structures described in the above table, and are usually built inside the functions BuildAndSend(..), SendPublicKey(..), SendTextMessage(..), and SendErrorMessageTo(..) inside the SecureChatIOCP class.

The package types are defined in the table below, and are handled with the appropriate functions, OnXXXX(..), where XXXX denotes the package type (e.g., OnPKG_PUBLIC_KEYA(..)).

Package (of enum type PackageTypes)		How the package is handled
`PKG_ERRORMSG` Encapsulated in structure 1 (Table 1)		This type of a message contains an error message that is going to be sent to the client. The package is received by the client and the error message is presented. As soon as the package is sent, the connection is closed.
`PKG_PUBLIC_KEYP` Encapsulated in structure 2 (Table 1)		This package is sent by the client. The package contains the public key, P, according to the Diffie-Hellman (D-H) key exchange protocol (section 5.4). When the server receives this package, it saves the public key, and generates a 512 bit private key, and computes and sends the public key, B, by `ComputeAndSendPublicKeyA(..)`.
`PKG_PUBLIC_KEYA` Encapsulated in structure 2 (Table 1)		This package is sent by the server, and contains the public key, A, according to the Diffie-Hellman (D-H) key exchange protocol (section 5.4). This package is received by the client, which generates a private key, and computes the session key (`ComputeAndSetSessionKey(..)`), and computes and sends the public key, B (of type `PKG_PUBLIC_KEYB`), with the function `ComputeAndSendPublicKeyB(..)`.
`PKG_PUBLIC_KEYB` Encapsulated in structure 2 (Table 1)		This package is sent by the client to the server (see `PKG_PUBLIC_KEYA`). The session key is computed, and a signature is computed and sent to the client (`ComputeAndSendSignature(…)`) which signs the public keys, A and B, if we have defined `#define USE_SIGNATURE` to protect the client against “Man in the Middle” attacks (section 3.4 and 4).
`PKG_SIGNATURE` Encapsulated in structure 2 (Table 1)		This package is sent by the server, and contains a signature that confirms the packages `PKG_PUBLIC_KEYA` and `PKG_PUBLIC_KEYB`. The client receives the package, and validates the server using the trusted static key `SecureChatIOCP::m_PubN` that is hard-coded into the client. If everything is fine, a `PKG_USERNAME_PASSWORD` is sent to server, otherwise a `PKG_ERRORMSG`, that also terminates the connection.
`PKG_TEXT_TO_ALL` Encapsulated in structure 1 (Table 1)		This package is always sent after the key exchange procedure, and is encapsulated inside a `PKG_ENCRYPTED` package. When received by the client, the content is shown in the text box of the chatting client. However, if it is received by the server, it is sent to everybody.
`PKG_USERNAME_PASSWORD` Encapsulated in structure 3 (Table 1)		This package is always sent as a response from the client that the client accepted the signature. This package is encapsulated inside a `PKG_ENCRYPTED` package.
`PKG_ENCRYPTED`		This package is sent after the key exchange procedure, and is handled inside `NotifyReceivedPackage(..)`. The contents of the package are decrypted and processed according to the defined packages in this table.

Table 2: Describes the different types of packages and how they are handled.

5.6.2. The key exchange details

The techniques used to implement the secure chat are Digital Signing/Verification (section 5.3.4), Diffie-Hellman (D-H) key exchange protocol (section 5.4), and Rijndael explained in section 5.2.

The key exchange procedure is done in the following way (to read more about the implementation details, read the previous section 5.6.1, tables 1 and 2):

The client generates a public key, p; the public key, g, is always equal to five. A package type of PKG_PUBLIC_KEYP is sent to the server. This is done when the client connects to the server, and is performed in the function NotifyNewConnection(..).
The server receives the package PKG_PUBLIC_KEYP and sends the package PKG_PUBLIC_KEYA (read section 5.6.1).
The client receives the PKG_PUBLIC_KEYA. The client computes the session key, and sends PKG_PUBLIC_KEYB to the server according to section 5.6.1.
The server receives the PKG_PUBLIC_KEYB. The server computes the common session key, and sends a signature in PKG_SIGNATURE to the client.
The client validates the signature, and sends a PKG_USERNAME_PASSWORD package to the server.

Now, both the client and the server have a common secret session key, according to the Diffie-Hellman (D-H) key exchange algorithm, and also the client is protected against "Man in the Middle" attacks.

5.6.3. The digital signing details

To digitally sign the message, the client must trust a public key. This trusted public key is hard-coded in the software (even if it is public) as the global constant SecureChatIOCP::m_PubN declared in the SecureChatIOCP class. The same public key and the secret key are also hard coded into the server SecureChatIOCP::m_PrivD and SecureChatIOCP::m_PubN. Remove the #define USE_SIGNATURE from SecureChatIOCP.h to use the protocol without the “Man in the Middle” protection and digital signing.

Observe! The digital signing in the demo is not secure because everyone that downloads this demo knows the private key used to sign the data.

Use the Demo section and the “Generate DSA key” button in the server demo to generate your own private and public keys to be used for digital signing. Copy the generated keys to SecureChatIOCP::m_PrivD and SecureChatIOCP::m_PubN to obtain your own secure client-server framework.

5.6.4. Different implementation approaches

As we have discussed before in previous sections, there are many different implementation approaches. The key exchange implementation is quite slow; it takes about 600ms to do a safe key exchange (section 5.5). This is because of the digital signing computation. More efficient alternative digital signing algorithms other than RSA recursive can be used to decrease the key exchange time.

A quite nice alternative would be that the server uses the same public key, P, and public key, A, and signs only the public key, A. By doing that, the key exchange procedure would be much faster from a server point of view, because the computation needs to be done only once at the server startup, or at a certain interval (few hours).

As a result of that, a larger key size can easily be used, because we do not do computations every time. However, the security of the protocol would be compromised using this approach, even if it is a quite good fix. For example, imagine that the public key A is broken.

5.6.5. Special considerations

When you compile this source code for 64 bit processors, make sure that you change the parameters defined in MyCryptLib.h (e.g.: _HIBITMASK_, MAXIMUMNR) to appropriate values.

For commercial release, it is important to protect this software against buffer overflow attacks [1], specially regarding the class MyCryptLib. Use the compiler /GZ switch, found in Visual C++ 6.0, and the /GS switch for the Visual C++ .NET compiler, to prevent buffer overflow attacks [1].

6. Future work

Optimization of the multi-precision library used to implement asymmetric encryption and key exchange.
Client authorization, using hashed SHA256 password / username.
File transfer functionality between users.

7. F.A.Q.

Q: The protocol ensures that the client is protected from “Man in the Middle” attacks by using digital signing. In the server side, the client is not authorized in a similar way, why?

A: In practice, the most important thing is to protect the clients from being hacked. Also, usually, the server authorizes the client using a name and a password. Here, it is important that the client is not “fooled” by a “Man in the Middle” attack to send information such as password, etc., to the attacker.

8. Revision History

Initial release - 2006-06-08.

9. References

Compiler Security Checks In Depth, 2006-05-02
Crypto++® Library 5.2.1, 2006-05-02
OpenSSL Project, 2006-05-02
CrypTool, 2006-05-02
RSA Security, 2006-05-02
A simple IOCP Server/Client Class, 2006-05-02, Amin Gholiha
“Cryptanalysis of Block Ciphers with Overdefined Systems of Equations (or the XSL attack on block ciphers)”, Nicolas Courtois, Josef Pieprzyk, Asiacrypt 2002, LNCS 2501, pp. 267-287, Springer
"A report about the Courtois and Pieprzyk attack on AES", Leah McFall, Asiacrypt 2002 conference, Tuesday, 3 December 2002
The Art of Computer Programming, Knuth, Donald. 1968
RSA Encryption Standard, RSA Laboratories. PKCS #1 v2.1: June 2002.
The Art of Computer Programming, Vol 2: Seminumerical Algorithms, Knuth, Donald.E. Addison-Wesley, Reading, Mass., 1981
Reliable Software Technology or Cigital, 2006-06-07