## Introduction

Hashing is one of the basic and often used features when working with crypto systems. It is used widely and can be used with different purposes. When a clear text is hashed, a relative unique “finger print” (the hash value) of the clear text is made. The hash value will often be shorter than the text which has been hashed. This will of course introduce some contradictions to the uniqueness that must be avoided as much as possible when making the (one-way) hashing algorithm. This is a problem, which can be minimized (but impossible to eliminate) mathematically only. This is one of the main reasons why you should not make a hashing algorithm yourself unless you completely master the mathematics behind. A rule of thumb is that it should be easy to evaluate H (x1) but difficult to find x1 (and x2).

H (x1) = H (x2), x1 != x2

Any contradictions to the uniqueness are basically a security risk because it will make it easier to find a text that will match the hashed value. When using the hash value to store passwords, it is even more clear how the disadvantages of the contradictions can be abused.

### Birthday attack

A birthday attack is a type of cryptographic attack which exploits the mathematics behind the birthday paradox, making use of a space-time tradeoff. Specifically, if a function yields any of *n* different outputs with equal probability and *n* is sufficiently large, then after evaluating the function for about 1.2Ön different arguments, we expect to have found a pair of different arguments x1 and x2 with H (x1) = H (x2), known as a collision. If the outputs of the function are distributed unevenly, then a collision can be found even faster.

### The paradox itself

The birthday paradox states that if there are 23 people in a room then there is a chance of more than 50% that at least two of them will have the same birthday. This means that a higher probability applies to a typical school class size of thirty, where the 'paradox' is often cited. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower than 50:50. Calculating this probability (and related ones) is the birthday problem. The mathematics behind it has been used to devise a well-known cryptographic attack named the birthday attack.

- H (x) outputs k bit.
- Choose x randomly and evaluate H (x) t times then.
- x1 and x2 => h(x1) = h(x2) where x1 != x2.
- It can then be expected that t is about 2^k/2.

When t(t-1)/2 pairs each with the probability of 2^-k, the probability for a collision will be:

(t/(t-1)) / (2^(k+1)) = t^2 / 2^k
t^2 = 2^k <=> T = sqr (2^k) = 2^k/2

To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as large as needed to prevent an ordinary brute force attack. In cryptanalysis, a brute force attack is a method of defeating a cryptographic scheme by trying a large number of possibilities.

### Practical example

At a UNIX system a hashing value is generated of the users password, which is stored into the local system database / the file system. Every time the user enters the password it generates a hash and compares it to the local stored hash value. If these two matches, the user is authenticated otherwise the password is incorrect. With a badly implemented hashing algorithm, it is easier to find another password that will match the exact same hash value. In practice this means that more than one password can be used for the user accounts.

This is why it is very important to choose an aggressive cryptographic hashing algorithm and maintain a high complexity of passwords, as well as a short lifecycle of them to ensure that they are changed regularly. When choosing the time of the lifecycle, it is important that the lifecycle is shorter than the time it takes to find another value that matches the stored hash value.

### Dictionary attack

A well-known attack to such systems is a dictionary attack. When performing a dictionary attack, the intruder is iterating through a list containing a large amount of words, e.g., a dictionary. The iteration will stop (or maybe just log) when it finds a word that matches the hash value. When the last word has been tried the process will terminate or it will start over with another user account. Different kinds of cycles have further been implemented to avoid the intruder detection being activated, when the attack is made online.

H (x1 || Ru) != H (x2), x1 != x2

Because of the risks of such an attack, most UNIX systems do not make a hash value based on the password only. Normally they include a random chosen (user specific) value, which is concatenated with the entered password where the random value is stored locally on the system and is changed regularly. This prevents offline dictionary attacks.

## Using the code

The Secure Storage component is made like a MFC application. The component is encapsulating the hashing functionality in the Win32 Crypto API, which is placed in separated source and header files (the *crypto.cpp* and *crypto.h* files).

The functionality can be tried using the graphical user interface where clear text can be entered and a hash value is generated on behalf of it. The hash value will afterwards be shown in the disabled textbox.

void make_hash(const char * data, BYTE ** hash, long * len)
{
HCRYPTPROV hProv = 0;
HCRYPTHASH hHash = 0;
BYTE *pbHash = NULL;
DWORD dwHashLen;
BYTE * pbBuffer = NULL;
DWORD dwCount;
DWORD i;
unsigned long bufLen = 0;
if(!CryptAcquireContext(&hProv, NULL, NULL, PROV_RSA_FULL, 0)) {
return;
}
if(!CryptCreateHash(hProv, CALG_MD5, 0, 0, &hHash)) {
return;
}
bufLen = strlen(data);
pbBuffer = (BYTE*)malloc(bufLen +1);
memset(pbBuffer, 0, bufLen +1);
for(i = 0 ; i < bufLen ; i++) {
pbBuffer[i] = (BYTE)data[i];
}
if(!CryptHashData(hHash, pbBuffer, bufLen, 0)) {
return;
}
dwCount = sizeof(DWORD);
if(!CryptGetHashParam(hHash, HP_HASHSIZE, (BYTE *)&dwHashLen, &dwCount, 0)) {
return;
}
if((pbHash = (unsigned char*)malloc(dwHashLen)) == NULL) {
return;
}
memset(pbHash, 0, dwHashLen);
if(!CryptGetHashParam(hHash, HP_HASHVAL, pbHash, &dwHashLen, 0)) {
return;
}
*hash = pbHash;
*len = dwHashLen;
if(hHash) CryptDestroyHash(hHash);
if(hProv) CryptReleaseContext(hProv,0);
}

When making the hash value programmatically the method as outlined above is used. It takes three arguments; the data that has to be hashed (the clear text), a byte pointer initially set to `NULL`

, which will be pointing at the hash value, and finally the length of the hash value.

{
BYTE * hash = NULL;
long len = 0;
make_hash(“Text to be hashed”, &hash, &len);
}

When showing the hash value it can be cast to a char pointer (`char *`

) but this have the disadvantages that not all characters are shown **as is**. A better way to show the hash value is to stream it to a file, which can be read afterwards.

void write_file(const char * filename, unsigned char * data, long length)
{
FILE * fp = fopen(filename, "w+b");
if (fp) {
fwrite(data, 1, length, fp);
fclose(fp);
}
}

Please notice that the file is being written as binary. The reason for this is that we are writing binary data to the file.

## Points of interest

Using the Win32 Crypto API, please notice that it might not necessary compile **as is**. This is because a pre-processor definition is missing in the project.

#define _WIN32_WINNT 0x0400

When defining the above globally to the project, it will be possible to compile code that is using the Crypto API. My suggestion is to place the definition in the *StdAfx.h* file.

<!--

Get a PDF Version of the article here.
Please notice that the file has been renamed so it has an .xml extension instead.
Right click at the link and choose "Save destination as..." to get the file.
Rename the file back to *.pdf* and open it using Acrobat Reader.

-->