1. Abstract
The age of information is also the age of digital information assets, where the professional programmer has to deal with cryptography. This article presents the theory, source code, and implementation for variable key size RSA encryption/decryption, digital signing, multi precision library, DiffieHellman key exchange, entropy collection, pseudo random number generator, and more. The article presents how to implement your own secure protocol using the IOCP technology, by presenting a secure chat client/server solution implementation.
2. Requirements
 The article expects the reader to be familiar with C++, TCP/IP, socket programming, MFC, and multithreading.
 Some mathematic knowledge about elementary number theory and statistics is required.
 The source code uses Winsock 2.0 and IOCP technology, and requires:
 Windows NT/2000 or later: Requires Windows NT 3.5 or later.
 Windows 95/98/ME: Not supported.
 Visual C++ .NET or a fully updated Visual C++ 6.0.
3. Introduction
The age of information is also the age of digital information assets. The professional developer has to deal with cryptography to make data storage and transmission secure. The purpose of this article is not to “reinvent the wheel” or implement home made, mathematicallyunsafe cryptographic algorithms. This article focuses on the practical details concerning cryptographicallysafe protocols, and presents the theory and source code for a secure client/server solution that can be used for any type of client/server application.
To know cryptography in theory is essential for a developer, but to implement it in practice is difficult. There are many security exploits as buffer overflow [1] and others that arise with the implementation of theoretically secure algorithms. Many commercial/free high quality cryptography libraries exist in the market, as Crypto++ [2], OpenSSL [3], and Crypttool [4]. To use these libraries, the developer has to know the cryptography theories behind the implementation, and also be aware of “what is happening under the hood of the library”. This is not an easy task, because the internal structure of these libraries can be complex, and the libraries contain unnecessary functionality that is not always needed.
This article briefly explains cryptographic theories involving cryptographicallysafe communication protocols, and also presents how this is implemented by providing a secure chat client/server solution.
4. Background
This section explains some of the cryptographic theories and terminology needed to understand the rest of the article, namely section 5. If you think that you have enough experience and knowledge and want to get your hands dirty, please continue to the next section (5. Implementation).
By using cryptographic algorithms, data storage and transmission can be performed in a secure way. Using algorithms to change the data in such a way that only an authorized recipient is able to reconstruct the data is called encryption. For an unauthorized recipient, the encrypted data looks like a meaningless and random sequence of bits. A cryptographic algorithm, also known as cipher, is a mathematical function which uses plain text (the data) as the input and produces cipher text (the encrypted data) as the output (and vice versa). A secret key is used together with the cipher to encrypt the plain text, and the same key or another key is used to decrypt the cipher text back to plain text.
Different cryptographic algorithms or ciphers have different mathematical properties and weaknesses. The details of a cipher is usually made public. It is in the secret key that the security of a modern cipher lies, not in the details of the cipher. To get additional information and knowledge about security, please read and try Crypttool [4]. The Crypttool software demonstrates several encryption attack implementations, and gives you additional information about the algorithms used in this article.
4.1. Symmetric Cryptography
In Symmetric Cryptography, the sender and recipient use the same secret key to encrypt and decrypt plain text. This means that the sender and recipient must be in possession of a common (secret) key which they have exchanged before actually starting to communicate.
+
The advantage of symmetric algorithms is the high speed with which data can be encrypted and decrypted.

One disadvantage is the need for key management. In order to communicate with each other confidentially, the sender and the recipient must have exchanged a key using a secure channel, before actually starting to communicate.
Most modern symmetric algorithms operate on blocks of plain text. The procedures usually consist of many complex rounds of bit shifts and transformations to gain protection against different mathematical analysis attacks. In section 5, we will choose symmetric cryptography based on its properties and cryptographic strength.
4.2. Asymmetric Cryptography
Unlike asymmetric encryption, each subscriber has a personal pair of keys consisting of a secret key and a public key. The public key is public, this means that anyone can have it. The keys are mathematically related, yet it is computationally difficult to deduce one from the other. Using the public key, anyone can encrypt the plain text, and only the one that has the private key can decrypt the message.
Using asymmetric encryption, two entities on a network (let’s call them Alice and Bob) can communicate securely using the following simple protocol:
 Bob and Alice exchange public keys.
 Alice encrypts her message with Bob's public key and sends it to Bob.
 Bob encrypts his message with Alice's public key and sends it to Alice.
 Bob decrypts Alice's message with his private key, and Alice decrypts Bob’s message with her private key.
Now, they can communicate because they know their private keys, and no one else can decrypt the message because they do not have Alice and Bob's private keys. This is not entirely true because of the “man in the middle attack” and practical reasons that we will discuss in section 4.3 and 4.5.
+
The advantage is that no secure channel is needed before messages are transmitted, because all the information required in order to communicate confidentially can be sent openly.

The disadvantage is that pure asymmetric procedures take a lot longer to perform than symmetric ones. Therefore, asymmetric encryption is used to exchange a session key used with symmetric cryptography.
The most wellknown asymmetric procedure is the RSA algorithm [can be found in 5, 10], named after its developers Ronald Rivest, Adi Shamir, and Leonard Adleman. The RSA algorithm was published in 1978. The RSA algorithm can be used for both public key encryption and digital signatures. Its security is based on the difficulty of factoring large integers. We will discuss the RSA encryption algorithm and its implementation in section 4.6.
4.3. ManintheMiddle Attack
The public asymmetric encryption secure communication protocol between Alice and Bob described above in section 4.2 is vulnerable to a maninthemiddle attack.
Let's assume that Mallory is an enemy hacker and can:
 listen to the traffic between Alice and Bob.
 can modify, delete, and substitute Alice's and Bob's messages, and also introduce new ones.
Mallory can impersonate Alice when talking to Bob, and impersonate Bob when talking to Alice. This is possible in a real life communication over the Internet.
An example of the attacker:
 Bob sends Alice his public key. Mallory intercepts the key and sends his own public key to Alice.
 Alice generates a random session key, encrypts it with Bob’s public key (which is really Mallory's), and sends it to Bob.
 Mallory intercepts the message. He decrypts the session key with his private key, encrypts it with Bob's public key, and sends it to Bob.
 Bob receives the message thinking it came from Alice. He decrypts it with his private key, and obtains the session key.
Alice and Bob start exchanging messages using the session key. Mallory, who also has the session key, can now decipher the entire conversation. This can, of course, be solved by using OneWay Hash functions (section 4.4) to digitally sign the message package (section 4.5 and 5.3).
4.4. Message Digest
A message digest, also known as a oneway hash function, is a mathematical function that takes a variablelength input and converts it into a fixedlength binary sequence called the hash value. Another important property of a message digest is that it is hard to reverse the process. Given the hash value, it is hard or impossible to find the input. Furthermore, a good hash function also makes it hard to find two different data inputs that produce the same hash value. Even a slight change in an input data causes the hash value to change drastically, therefore, a message digest is used to check the integrity/correctness of the digital data.
This makes a oneway hash function a central notion in publickey cryptography, and is used when producing a digital signature (section 4.5 and 5.3) for a message. Message digests are also used to distribute randomness. For example, a 8 bit true random number is consumed by a message digest and becomes 160 bits, where the randomness is distributed among the 160 bits. The most popular oneway hash algorithms are MD4 and MD5 (both producing a 128bit hash value), and SHA, also known as SHA1 (producing a 160bit hash value).
4.5. Digital Signing
The main concept of digital signing is that the receiver can verify that a document or message is from a certain person or company. Digital signing is made using asymmetric cryptography (usually RSA) and message digest to sign and verify a message, as described in the procedure below. For this to be possible, we need a strong asymmetric public key (n,e) that we trust. If you use your private key to encrypt a message, anyone could decrypt it using your public key. What good is that? Well, it proves you are the one that encrypted it and it has not been altered since you did so. This forms the basis of the digital signature.
Normally, an independent third party that everyone trusts, whose responsibility is to issue certificates, is called a Certification Authority (CA). A certificate is a data package that completely identifies an entity, and is issued by a CA only after that authority has verified the entity's identity. The certificate can hold this asymmetric public key that everybody trusts.
In this article, we hardcode the trusted asymmetric public key (n,e) into the client software (also discussed in 5.6.3), and are not using other parties. We will discuss this later in section 5.3.4. The procedure of digital signing is described below:
4.5.1. Digital signing assumption
The signer A possesses the private key (d,e) and the public key (n,e). The recipient B already trusts the public key of A (n,e).
4.5.2. The digital signature procedure
 Create a message digest or hash of the information to be sent, that we call m (assuming m<n).
 The signer A uses the private key to compute the signature (s = m^d mod n).
 Send this signature s and the message to the recipient B.
4.5.3. The signature verification procedure
Recipient B does the following:
 Uses sender A's public key (n, e) to compute an integer (v = s^e mod n).
 Independently computes the message digest of the information that has been signed.
 If both message digests are identical, the signature is valid.
The implementation of this can be found in MyCryptLib::DemoDSA(..)
in the source code provided.
4.6. The RSA Algorithm
In this section, we briefly describe how the RSA algorithm works, we will discuss the implementation later. The RSA asymmetric algorithm [can be found in 5] was named after its developers, Ronald Rivest, Adi Shamir, and Leonard Adleman. Almost all known commercial secure protocols depend on RSA public key exchange. The strength in RSA lies in the difficulty to factor two very large prime numbers, p and q.
4.7. The RSA Procedure
The security of the RSA algorithm depends, as with all public key methods, on the difficulty to calculate the private key, d, from the public key, (n, e). This is obtained by making it difficult to factorize the product of two prime numbers (p,q) from the public key n (described below, section 4.7.1). Therefore, the prime numbers (p,q) must be very large.
4.7.1. Standard RSA algorithm description
n = pq where p and q are distinct primes.
phi, = (p1)(q1)
e < n such that gcd(e, phi)=1, usually e=3.
d = e^1 mod phi, d is the private key.
Encryption:
c = m^e mod n, 1<m<n.
Decryption:
m = c^d mod n.
This means that we need to operate with large integers, and also generate big prime numbers. We will discuss the implementation of this algorithm later, in section 5.3.
4.8. Random Numbers  The Backbone of all Cryptographic Security
Random numbers are used to generate keys, prime numbers, etc. Therefore, random numbers or random bits are the backbones of all cryptographic security. If the numbers are not truly random, the security can easily be broken. The fact that computers are deterministic (a computing machine) and are unable to produce or handle true randomness, is the central problem. In this section, we discuss what randomness is, and also how computers generate pseudo random numbers.
4.8.1. Entropy, the mathematical name for statistical randomness
Statistics tells us a few things about random bits. Zeros ought to occur as often as ones. There are statistical tests, such as the chisquared test, that you can perform on a stream of numbers to show how random they are. The degree to which a stream of bits follows this statistical form is the degree to which it is said to be entropic. This is the strict mathematical definition.
4.8.2. Pseudorandom number generators
A pseudo random number generator is an algorithm that generates a sequence of numbers that acts like random numbers, but they are not. The algorithm starts with a “seed”, a number to start with, and then uses that seed to generate a series of numbers that “act” as if they were random.
Usually, all pseudo random number generator have a period. This means that, after a while, they start to follow a certain pattern. And there are other “bad” properties as:
 Shorter than expected periods for some seed states (the period is dependent on the seed value)
 Poor dimensional distribution
 Mutually dependent values
 Some bits being 'more random' than others
 Lack of uniformity (everything is not equally probable)
In late 1999, the ASF software Texas Hold’em poker application used the standard rand()
to shuffle the cards. It was discovered by Cigital [12], that only five cards needed to be known to predict the remaining cards. The pseudorandom number generator algorithm used to generate random values is very important because of the aspects that we have discussed here. In this implementation, we use the Mersenne twister pseudorandom number generator developed in 1997 by Makoto Matsumoto and Takuji Nishimura, more details of this later, in section 5.
4.8.3. Collecting entropy
Collecting truly random numbers is hard, but is needed. There exist hardware devices that generate true noise, observing cosmic ray flux, etc. However, we need to use the equipments that we have, which are the computers and the users. Instructing the user to move the mouse around the screen in a random manner (as in RandDialog
in the source code) or type some random letters, is a good way to generate data, but it takes too much time, and often we do not have a user (for servers). Using the existing hardware in the computer that has some entropy (randomness) is also another way.
 The clock in the computer is a quite good source of entropy, finer resolution clocks are better. But if we are using, for example,
TickCount()
to get a random seed, a search will find it quickly. There are only 31.5 million ticks in a year. This is when we are using different parts of the hardware.
 At the Crypto '94 conference, Davis, Ihaka, and Fenstermach showed that air turbulence inside a disk drive creates enough randomness to make cryptographicstrength random numbers.
 Use a microphone. Reading the microphone that is built into many computers gives a source of apparently random data.
Once we have collected random data, we have to distribute the randomness, or distill it in a nice way so all bits are equally random. The best way to do this is to use some sort of message digest (discussed in section 4.4).
4.8.4. Summary
Symmetric Cryptography 
When the same key is used for encryption/decryption. Symmetric ciphers are fast, but the problem is that the communicating entities need to exchange the key. 
Asymmetric Cryptography 
A public key is used to encrypt data, and another secret key is used to decrypt data. The algorithm used for this is mainly RSA, and it is based on the difficulty to factorize big prime numbers. It is hard to derive the secret key from the private key. The communication can be done in open, but is not safe for “man in the middle attacks”.
Asymmetric encryption is slow, and is only used to exchange keys. 
ManintheMiddle Attack 
It is possible to listen, modify, and delete data packages between two entities in a network. Therefore, it is possible for a hacker to impersonate others in a network and be the “ManintheMiddle”. The solution to this is digital signing. 
Message Digest 
Is a very important oneway function that produces a hash value of a fixed size, given data of variable size. This is used to digitally sign data, compute checksum of data, protect plain text passwords, and also distribute the randomness (entropy) in data. 
Digital Signing 
Can be based on the RSA algorithm. Is a way to determine that the data really comes from the sender (eliminates ManintheMiddle attacks). 
Random Numbers  the backbone of all cryptographic security 
Random numbers are used to generate prime numbers, keys, etc. Therefore, they play an important role in cryptography. It is hard and time consuming to generate true random numbers. And often, generating them needs special equipment. Seeded pseudo random number generators with good statistical properties can be used instead. The seed can be collected with much entropy using different sources, and be distilled with a message digest. 
5. Implementation
In this section, we describe the steps to implement a secure protocol. The reason for the chosen methods, the algorithms, and their reliability and performance are discussed. Furthermore, the source code is briefly explained and presented.
5.1. The Protocol Design – General Overview
To implement a secure protocol, we need to have a symmetric cipher (section 4.1) and a key exchange procedure using asymmetric encryption (section 4.2).
Figure 1. The figure represents the dependencies between the classes involved in the implementation.
The protocol is implemented using three classes namely MyCryptLib
, CRijndael
, IOCPS/SecureChatIOCP
(see figure 1). The MyCryptLib
class contains the source code for all the details around the key exchange procedure and digital signing, CRijndael
is the symmetric cipher class, and IOCPS/SecureChatIOCP
is a comprehensive IOCP server class [6]. Read more about the IOCP technology in my other article “A simple IOCP Server/Client Class” [6]. The details around the implementation are given in this section.
5.2. The Symmetric Cipher
Rijndael was selected as the Advanced Encryption Standard (20001002) [7]. The algorithm is designed with the state of the art in cryptographic research. The algorithm is fast, and is resistant against cryptographic attacks as linear crypt analysis, differential crypt analysis, etc. However, the Rijndael algorithm can be written as a system of multivariate quadratic equations. To solve these equations to obtain the key or plain text is difficult, because there are no efficient algorithms for solving such systems. There is, however, a lot of research around this area, and many methods such as XL and XLS has been presented to solve big multivariate quadratic equation systems [8]. Therefore, 128bit Rijndael may not be safe anymore. There are also proposals for algorithms that might break the 256 bit Rijndael, but this is not certain [8]. For now, the (2005) 256bit Rijndael is sufficiently secure for our task, but, this might not be in the future.
The symmetric 256bit Rijndael cipher is implemented using the class CRijndael
. The cipher is used with a 256bit key, and encrypts/decrypts blocks of size 16 bytes.
Figure 2: The figure shows how data is encapsulated inside an encrypted package. The package size denotes the size of the package. The size of the data inside denotes the real size of the data. If the data does not fit into 16 byte blocks, the data is padded with random data. Also, a CRC16 checksum is added to the block for more security.
The data is packaged into a number of blocks (figure 2), and if the data does not fit the blocks, it is padded with random numbers. A CRC16 checksum is also added to the block to avoid an attacker changing the encrypted data, and also to determine if the decryption was successful. The implementation of the encryption and decryption is done in the functions SecureChatIOCP::EnCryptBuffer(..)
and SecureChatIOCP::DeCryptBuffer(..)
.
5.2.1. Implementation source code
For each package, the Cipher Block Chaining (CBC) is used to encrypt the data. In CBC mode, an encrypted block is obtained by first XORing the data block with the previous encrypted block and then encrypting the resulting value. This means that we need to enter a 16 byte block along with a key to encrypt/decrypt. Each connection/client has an instance of the class CRijndael
inside its context, this class is initialized with the key and the initial chain block, with the function:
void CRijndael ::MakeKey(char const* key, char const* chain,
int keylength=DEFAULT_BLOCK_SIZE, int blockSize=DEFAULT_BLOCK_SIZE)
{
….
}
The functions EnCryptBuffer(CIOCPBuffer *pBuff, ClientContext *pContext)
and DeCryptBuffer(CIOCPBuffer *pBuff, ClientContext *pContext)
in the class SecureChatIOCP
are used to encrypt/decrypt buffers, as in figure 2. For more information and details, please see the source code.
5.3. Asymmetric Encryption – The RSA/DSA Implementation
In sections 4.54.8, we discussed what RSA/DSA is, and we know that to use asymmetric encryption, we need to perform computations with very large numbers. This is, however, not possible since the hardware only supports 32 or 64 bit operations. That is why we need to go back to our old, plain computing books to find numerical algorithms to implement computations with arbitrary bit lengths using the existing hardware. This is why we have implemented a multiprecision library. Furthermore, we need to generate very big prime numbers (section 4.6).
5.3.1. Multiprecision library
By using numerical algorithms [9], a set function denoted by BN*(…)
(BN stands for Big Number) is implemented in the MyCryptLib
class. The focus of the implementation has been simplicity, and not performance. The functions implement all the operations needed for our task, using existing hardware that operates with the type DWORD
. The BN*(…)
function can compute with numbers with an arbitrary bit length. To know what is happening under the hood is important here, since different implementation exploits can be used to hack the system (discussed in section 2).
The algorithm used to implement the different arithmetic operations as addition, division, and subtraction is complicated, and therefore, we explain only one of them here, namely addition.
Let's start by adding two binary bits. Since each bit has only two possible values, 0 or 1, there are only four possible combinations of inputs. These four possibilities, and the resulting sums, are:
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 01
The fourth line indicates that we have an overflow! We need two bits to store the output! To add numbers of two bits can now be performed as in figure 3. The carrier which denotes the overflow in the one bit adder is fed into the adder nr 2 to be summed into S2. By doing this, we can add two bits using two 1 bit adders.
Figure 3. The figure illustrates a two bit adder. A, B are two bit numbers, and S is a two bit number. The count denotes if we have an overflow or not.
In a similar fashion, we can use 32 bit hardware to operate with numbers of arbitrary lengths. Please see the source code below, for addition:
DWORD MyCryptLib::BNAdd(DWORD C[], const DWORD A[],
const DWORD B[], const UINT nSize)
{
DWORD k=0; for (UINT i = 0; i < nSize; i++) {
C[i] = A[i] + k;
if(C[i]>=k)
k=0;
else k=1;
C[i] += B[i];
if (C[i] < B[i])
k++;
}
return k;
}
To get additional information about the implementations, please read the source code. And, also check the DemoSimpleTest(..)
function in the MyCryptLib
class.
5.3.2. Prime number generation
 To generate prime numbers, we need the following:
 We need to collect entropy (true randomness) to feed the random generator with.
 A fast pseudo random generator with good statistical properties.
 The last thing that is needed is a fast function that determines if a number is a prime number or not.
Collecting entropy
To collect entropy is not an easy task, as discussed in section 4.8.3. In our application, we collect entropy from the user using the class CRanDialog
that collects random data when the user moves the mouse randomly. If the user is not present, the entropy is collected from different hardware components, as described in the source code below:
BOOL MyCryptLib::MTCollectEntropy(BYTE *pRandomPool, UINT nSize)
{
while ( nSizenCollected>0 ) {
SHA1_Hash(pEntropyBucket,SHA1_DIGEST_SIZE,&csha1);
dwRes=GetCurrentProcessId();
SHA1_Hash((BYTE*)&dwRes,sizeof(DWORD),&csha1);
dwRes=GetCurrentThreadId();
SHA1_Hash((BYTE*)&dwRes,sizeof(DWORD),&csha1);
GetSystemTime(&st);
SystemTimeToFileTime(&st, &ft);
SHA1_Hash((BYTE*)&ft,sizeof(FILETIME),&csha1);
dwTick = GetTickCount();
SHA1_Hash((BYTE*)&dwTick,sizeof(DWORD),&csha1)
SHA1_Hash((BYTE*)&ms, sizeof(MEMORYSTATUS),&csha1);
SHA1_Finish(EntropyBucket,&csha1);
if ( nSizenCollected < SHA1_DIGEST_SIZE ) {
memcpy(pRandomPool+nCollected,
pEntropyBucket,nSizenCollected);
nCollected+=nSizenCollected;
}
else {
memcpy(pRandomPool+nCollected,
pEntropyBucket,SHA1_DIGEST_SIZE);
nCollected+=SHA1_DIGEST_SIZE;
}
}
return TRUE;
}
The random generator
The collected entropy (either from the user or the hardware, with MTCollectEntropy(..)
) is fed into the Mersenne Twister Random Generator [11].
The generator has the incredible period 2^(219937 – 1). This is a number with 6000 decimal digits. The number of elementary particles in the universe is “only” estimated to be a 80digit number. The algorithm is also very fast, and has very good statistical properties. The 32bit random numbers exhibit the best possible equidistribution properties in dimensions up to 623. Please see the source code DWORD MyCryptLib::MTRandom()
and BOOL MyCryptLib::MTInit(BYTE *pRandomPool, UINT nSize)
to get additional information.)
RabinMiller Probabilistic Primality Test
Rabin Miller [11] is an algorithm that determines if a given number is a prime number or not. The algorithm is probabilistic, this means that it is not 100% secure, but is very fast. We will use this algorithm to find prime numbers as follows:
 Start with a random number A with size
nSize
.
 Make it odd.
 Use a small list of prime numbers and RabinMiller to check if the number is not a prime number.
 If the number is not a prime number, add 2 to it, and jump to 3 until we find a prime number.
Please see the functions BNMakePrime(…)
, BNMakeRSAPrime(..)
, and RSAGenerateKey(..)
to get additional information about prime number generation. As we can see in figure 4, it takes more time to find bigger prime numbers because of two reasons, the computations involved to find prime numbers take longer, and the prime numbers get less dense when they are bigger. It takes about 73 seconds to generate a 4096 bit RSA key on a Centrino laptop.
Figure 4. The figure represents the time it takes to generate an RSA key. The Xaxis denotes the size of bits, and the Yaxis denotes the time in seconds.
5.3.3. RSA Encryption/Decryption
The RSA encryption/decryption is discussed in section 4.7. To get more implementation details, please read the MyCryptLib::DemoRSA(..)
function. One important aspect discussed here is the performance. The time to encrypt data using different key lengths is between 0.01 ms to 1 ms, depending on the key length.
However, to decrypt a message takes much longer than to encrypt the message (figure 5). The time can be reduced significantly using a different decryption method called Chinese Remainder Theorem (CRT) [10]. This algorithm (the red line in figure 5) is much faster than the standard decryption (m = c^d mod n ) discussed in 4.7 (the blue line in figure 5).
Figure 5. In the figure, we can observe how long the decryption of the RSA algorithm takes for different key lengths. It can be observed that CRT gives much better performance than the standard decryption algorithm (m = c^d mod n).
In the CRT algorithm, the private key is represented as a quintuple (p, q, dP, dQ, and qInv), where p and q are prime factors of n, dP and dQ are known as the CRT exponents, and qInv is the CRT coefficient. The CRT method of decryption is four times faster, overall, than calculating m = c^d mod n (for large key lengths).
The extra values for the private key are:
dP = (1/e) mod (p1)
dQ = (1/e) mod (q1)
qInv = (1/q) mod p where p > q
These are precomputed and saved along with p and q, as private key. To compute the message m, given c, do the following:
m1 = c^dP mod p
m2 = c^dQ mod q
h = qInv(m1  m2) mod p
m = m2 + hq
Even though there are more steps in this procedure, the modular exponentiation to be carried out uses much shorter exponents, and so it is less expensive overall. The implementation of this can be found in MyCryptLib::RSADecryptCRT(…)
.
5.3.4. Digital signing implementation
The details around digital signing are discussed in section 4.5. According to sections 4.5.2 and 4.5.3, a digital signature is implemented by the functions DigitalSignSHA1rDSA(..)
and DigitalVerifySHA1rDSA(..)
in the MyCryptLib
class. The function uses a SHA1 message digest to compute the checksum of a message, and then sign/verifies it. To get additional information and examples, please see the DemoDSA(..)
function inside the MyCryptLib
class, or use Server>CryptoLibrary>GenerateKey.
To verify a signature is not CPUexpensive (takes less than 1ms, depending on the key size), but to calculate a signature is very expensive, as shown in figure 6 below. We can observe (figure 6) that signing a message using a key length of 2688 bits takes 1 second. Therefore, it is not possible to sign every message sent to the client using that size.
Figure 6. The figure shows the time, in milliseconds, to compute a signature using the function DigitalSignSHA1rDSA(..)
with different key sizes. To sign a message using a key length of 2688 bits takes 1 second.
5.4. Key Exchange Procedure
Whitfield Diffie, Martin E. Hellman, and Ralph Merkle developed the DiffieHellman (DH) key exchange protocol in Stanford in 1976. The protocol protects against “ear dropping”, that means no one can know the secret key that is exchanged, but there is no protection against the “Man in the Middle” attacks discussed in section 4.3. The security of the algorithm lies on the difficulty to solve the discrete logarithm problem, that is: if g= 2, or 5, p is the public prime number, and A=g^a mod(p) is given, calculate the secret random key, a. This is very difficult if p is higher than 1024 bits, however, if the random secret key does not contain enough entropy, the key can be predicted, and then the algorithm is broken.
Description of the algorithm:
 The numbers g, p, A, B are public.
 Alice generates a secret key, a, and computes A= g^ a mod(p).
 Bob generates a secret key, b, and computes B=g^b mod(p).
 The keys A, B are exchanged.
 Alice computes S=B^a=g^(b*a) mod (p).
 Bob computes S=A^b=g^(a*b) mod (p).
 Alice and Bob have the same secret key because of the mathematical commutative law that gives g^(b*a) mod (p) = [a*b=b*a] = g^(a*b) mod(p). The implementation and usage of this algorithm can be found in the
DemoDiffieHellman(..)
function in the MyCrypt
class. In the figure below, we can see the computation time needed to exchange the keys according to this algorithm.
Figure 7. The figure shows how long a secure key exchange takes (in milliseconds (yaxis)) compared to the length of the public key, p. We can see that, for a 2048 bit key, it takes 600 ms to exchange the key.
5.5. Key Length
The security of all algorithms presented in the previous articles lies on the key length and the algorithm to generate the keys. A rule of thumb is to have 10%20% entropy (true randomness) bits of the key size. In 1999, a 512 bits public key was factored with an implementation of the Number Field Sieve algorithm (GNFS), developed by Buhler, Lenstra, and Pomerance. The time it took to break this key was five months. This means that a key of length 512 bits is no longer safe.
This made evident that a module length of 512 bits no longer protects from attackers. Within the last 20 years, a lot of progress has been made. Estimations about the future development of the ability to factor public keys vary, and depend on some assumptions:
 Development in computing performance (Moore’s law: every 18 months, the computing power will double) and in grid computing.
 Factorizations of prime numbers are part of very many topical research areas in number theory and computer science. The progress is bigger than estimated. This is because of the new knowledge about prime numbers.
In practice, it is still effectively impossible for ordinary people to break keys of size 512, but organizations like NSA with supercomputers can probably break keys. The length of a secure key is typically 1024 (year 2005), and should be increased by 24 bits/year. In figure 57, section 5.3, it is shown that the computation time for asymmetric encryption increases exponentially using larger prime numbers or keys.
"Oh, our system uses 4096bits security”, may sound impressive, but using too large keys decreases the performance, and also gives a false sense of security because of security issues involved in generating the key (10%20% entropy bits of the key size, the properties of the random number generator, etc.).
In our system, we will use a key size greater than 2048 for digital signing. Digital signing is used to protect from “Man in the Middle” attacks (discussed in sections 4.5.2 and 4.5.3), and is needed to be valid on the server side for a long time, since it is used by the clients to validate the sender. For key exchange procedures (section 5.4), the server accepts keys greater than 1024.
This means that the total key exchange procedure, including digital signing, takes approximately 600ms (100ms from figure 7, section 5.4, plus 500ms from figure 6, section 5.3.4) on a Centrino laptop computer.
5.6. Putting it Together
The server is implemented using IOCP technology [6]. The communication protocol in the server and the client demo is implemented in the SecureChatIOCP
class. The class is inherited from IOCPS [6] that uses the IOCP technology. A storage structure (ClientContext
found in iocps.h) is associated with each connection through IOCP. All the packages that are received from multiple connections are handled only with one or more threads called IOWorker
s. These threads call different functions namely NotifyReceivedPackage(..)
, NotifyNewConnection(..)
, etc., and how these functions are implemented define the communicating protocol. Usually, a more sophisticated academic finite state machine model is used to implement complex protocols with IOCP, however, we will keep it as simple as we can. Please read my article “A simple IOCP Server/Client Class” for more information and more details about the IOCP technology.
All the implementation is in the IOCPS
class and SecureChatIOCP
. By defining _IOCPClientMode_
, we change the protocol behavior from the server to the client.
We start of by defining some packages, and how they are handled in NotifyReceivedPackage(..)
.
5.6.1. Defining the packages
The table below describes the different package structures. A package's first bytes start with the top of the table (yellow).
Package structure 1 
Package structure 2 
Package structure 3 
4 bytes, defining the size of a package 
4 bytes, defining the size of a package 
4 bytes, defining the size of a package 
1 byte defining the package type. 
1 byte defining the package type. 
1 byte defining the package type. 
Variable size, payload, usually contains another encrypted package or a null terminated string. 
4 bytes defining the size of the payload (in number of DWORD s). 
4 bytes defining the size of the string (payload1) in bytes. 
Payload of variable size (usually contains a public key or signature). 
Payload1 of variable size. Usually contains a null terminated string (username and password). 
4 bytes defining the size of the string (payload2) in bytes. 
Payload2 of variable size (usually contains a null terminated string). 
Table 1: Describes the different package structures that are transferred between the client and the server. 
The packages have one of the structures described in the above table, and are usually built inside the functions BuildAndSend(..)
, SendPublicKey(..)
, SendTextMessage(..)
, and SendErrorMessageTo(..)
inside the SecureChatIOCP
class.
The package types are defined in the table below, and are handled with the appropriate functions, OnXXXX(..)
, where XXXX
denotes the package type (e.g., OnPKG_PUBLIC_KEYA(..)
).
Package (of enum type PackageTypes)  
How the package is handled 
PKG_ERRORMSG
Encapsulated in structure 1 (Table 1)  
This type of a message contains an error message that is going to be sent to the client. The package is received by the client and the error message is presented. As soon as the package is sent, the connection is closed. 
PKG_PUBLIC_KEYP
Encapsulated in structure 2 (Table 1)  
This package is sent by the client. The package contains the public key, P, according to the DiffieHellman (DH) key exchange protocol (section 5.4). When the server receives this package, it saves the public key, and generates a 512 bit private key, and computes and sends the public key, B, by ComputeAndSendPublicKeyA(..) . 
PKG_PUBLIC_KEYA
Encapsulated in structure 2 (Table 1)  
This package is sent by the server, and contains the public key, A, according to the DiffieHellman (DH) key exchange protocol (section 5.4). This package is received by the client, which generates a private key, and computes the session key (ComputeAndSetSessionKey(..) ), and computes and sends the public key, B (of type PKG_PUBLIC_KEYB ), with the function ComputeAndSendPublicKeyB(..) . 
PKG_PUBLIC_KEYB
Encapsulated in structure 2 (Table 1)  
This package is sent by the client to the server (see PKG_PUBLIC_KEYA ). The session key is computed, and a signature is computed and sent to the client (ComputeAndSendSignature(…) ) which signs the public keys, A and B, if we have defined #define USE_SIGNATURE to protect the client against “Man in the Middle” attacks (section 3.4 and 4). 
PKG_SIGNATURE
Encapsulated in structure 2 (Table 1)  
This package is sent by the server, and contains a signature that confirms the packages PKG_PUBLIC_KEYA and PKG_PUBLIC_KEYB . The client receives the package, and validates the server using the trusted static key SecureChatIOCP::m_PubN that is hardcoded into the client. If everything is fine, a PKG_USERNAME_PASSWORD is sent to server, otherwise a PKG_ERRORMSG , that also terminates the connection. 
PKG_TEXT_TO_ALL
Encapsulated in structure 1 (Table 1)  
This package is always sent after the key exchange procedure, and is encapsulated inside a PKG_ENCRYPTED package.
When received by the client, the content is shown in the text box of the chatting client. However, if it is received by the server, it is sent to everybody. 
PKG_USERNAME_PASSWORD
Encapsulated in structure 3 (Table 1)  
This package is always sent as a response from the client that the client accepted the signature. This package is encapsulated inside a PKG_ENCRYPTED package. 
PKG_ENCRYPTED
 
This package is sent after the key exchange procedure, and is handled inside NotifyReceivedPackage(..) . The contents of the package are decrypted and processed according to the defined packages in this table. 
Table 2: Describes the different types of packages and how they are handled.
5.6.2. The key exchange details
The techniques used to implement the secure chat are Digital Signing/Verification (section 5.3.4), DiffieHellman (DH) key exchange protocol (section 5.4), and Rijndael explained in section 5.2.
The key exchange procedure is done in the following way (to read more about the implementation details, read the previous section 5.6.1, tables 1 and 2):
 The client generates a public key, p; the public key, g, is always equal to five. A package type of
PKG_PUBLIC_KEYP
is sent to the server. This is done when the client connects to the server, and is performed in the function NotifyNewConnection(..)
.
 The server receives the package
PKG_PUBLIC_KEYP
and sends the package PKG_PUBLIC_KEYA
(read section 5.6.1).
 The client receives the
PKG_PUBLIC_KEYA
. The client computes the session key, and sends PKG_PUBLIC_KEYB
to the server according to section 5.6.1.
 The server receives the
PKG_PUBLIC_KEYB
. The server computes the common session key, and sends a signature in PKG_SIGNATURE
to the client.
 The client validates the signature, and sends a
PKG_USERNAME_PASSWORD
package to the server.
Now, both the client and the server have a common secret session key, according to the DiffieHellman (DH) key exchange algorithm, and also the client is protected against "Man in the Middle" attacks.
5.6.3. The digital signing details
To digitally sign the message, the client must trust a public key. This trusted public key is hardcoded in the software (even if it is public) as the global constant SecureChatIOCP::m_PubN
declared in the SecureChatIOCP
class. The same public key and the secret key are also hard coded into the server SecureChatIOCP::m_PrivD
and SecureChatIOCP::m_PubN
. Remove the #define USE_SIGNATURE
from SecureChatIOCP.h to use the protocol without the “Man in the Middle” protection and digital signing.
Observe! The digital signing in the demo is not secure because everyone that downloads this demo knows the private key used to sign the data.
Use the Demo section and the “Generate DSA key” button in the server demo to generate your own private and public keys to be used for digital signing. Copy the generated keys to SecureChatIOCP::m_PrivD
and SecureChatIOCP::m_PubN
to obtain your own secure clientserver framework.
5.6.4. Different implementation approaches
As we have discussed before in previous sections, there are many different implementation approaches. The key exchange implementation is quite slow; it takes about 600ms to do a safe key exchange (section 5.5). This is because of the digital signing computation. More efficient alternative digital signing algorithms other than RSA recursive can be used to decrease the key exchange time.
A quite nice alternative would be that the server uses the same public key, P, and public key, A, and signs only the public key, A. By doing that, the key exchange procedure would be much faster from a server point of view, because the computation needs to be done only once at the server startup, or at a certain interval (few hours).
As a result of that, a larger key size can easily be used, because we do not do computations every time. However, the security of the protocol would be compromised using this approach, even if it is a quite good fix. For example, imagine that the public key A is broken.
5.6.5. Special considerations
When you compile this source code for 64 bit processors, make sure that you change the parameters defined in MyCryptLib.h (e.g.: _HIBITMASK_
, MAXIMUMNR
) to appropriate values.
For commercial release, it is important to protect this software against buffer overflow attacks [1], specially regarding the class MyCryptLib
. Use the compiler /GZ
switch, found in Visual C++ 6.0, and the /GS
switch for the Visual C++ .NET compiler, to prevent buffer overflow attacks [1].
6. Future work
 Optimization of the multiprecision library used to implement asymmetric encryption and key exchange.
 Client authorization, using hashed SHA256 password / username.
 File transfer functionality between users.
7. F.A.Q.
Q: The protocol ensures that the client is protected from “Man in the Middle” attacks by using digital signing. In the server side, the client is not authorized in a similar way, why?
A: In practice, the most important thing is to protect the clients from being hacked. Also, usually, the server authorizes the client using a name and a password. Here, it is important that the client is not “fooled” by a “Man in the Middle” attack to send information such as password, etc., to the attacker.
8. Revision History
 Initial release  20060608.
9. References
 Compiler Security Checks In Depth, 20060502
 Crypto++® Library 5.2.1, 20060502
 OpenSSL Project, 20060502
 CrypTool, 20060502
 RSA Security, 20060502
 A simple IOCP Server/Client Class, 20060502, Amin Gholiha
 “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations (or the XSL attack on block ciphers)”, Nicolas Courtois, Josef Pieprzyk, Asiacrypt 2002, LNCS 2501, pp. 267287, Springer
 "A report about the Courtois and Pieprzyk attack on AES", Leah McFall, Asiacrypt 2002 conference, Tuesday, 3 December 2002
 The Art of Computer Programming, Knuth, Donald. 1968
 RSA Encryption Standard, RSA Laboratories. PKCS #1 v2.1: June 2002.
 The Art of Computer Programming, Vol 2: Seminumerical Algorithms, Knuth, Donald.E. AddisonWesley, Reading, Mass., 1981
 Reliable Software Technology or Cigital, 20060607