## Introduction

A complementary article, showing how to exploit this is here.

There is a Crisis in Hashland. Latest results on Cryptology are opening the debate about the security of cryptographic one way hash functions.

I suggested in another article that:

Don't store the user password on your database. No matter how many security measures you take, there is not a perfect security system. Use a hash method for the passwords, like SHA1, or MD5.

SHA1 and MD5 aren't secure anymore, because of projects like passcracking, we can't trust these hash functions for one way encryption.

In fact, experts suggest that:

"Given the number of practical attacks on MD5, it may be time to move

to a Federal Information Processing Standards (FIPS) approved hash

algorithm, such as SHA-256, or SHA-512. Note that vulnerabilities have

recently been found in SHA-1, however, and NIST is already planning to

phase it out by 2010." (Quoted from cn.bbs.comp.security.)

Update - here are some reactions to these issues:

Microsoft is banning certain cryptographic functions from new computer code, citing increasingly sophisticated attacks that make them less secure, according to a company executive.

The Redmond, Wash., software company instituted a new policy for all developers that bans functions using the DES, MD4, MD5 and, in some cases, the SHA1 encryption algorithm, which is becoming "creaky at the edges," said Michael Howard, senior security program manager at the company, Howard said. (Source)

To understand the consequences, this article first explains what one way hash functions are, shows one of their common uses on password storage, and shows the nature of the current attacks and their consequences, and also suggests other alternative hash functions stronger at the present time.

## One Way Hash Functions

The following definitions are taken from Bruce Schneier's Book: Applied Cryptography Second Edition:

A one-way hash function, *H*(*M*), operates on an arbitrary-length pre-image message, *M*. It returns a fixed-length hash value, *h*.

*h* = H(*M*), where *h* is of length *m*

Many functions can take an arbitrary-length input and return an output of fixed length, but one-way hash functions have additional characteristics that make them one-way [1065]:

- Given
*M*, it is easy to compute *h*. - Given
*h*, it is hard to compute *M* such that *H*(*M*)= *h*. - Given
*M*, it is hard to find another message, M’, such that *H*(*M*) *= H*(*M’*).

- ....

In some applications, one-wayness is insufficient; we need an additional requirement called **collision-resistance**.

It is hard to find two random messages, *M* and *M’*, such that *H*(*M*) = *H*(*M’*).

## Hashing Function Used for Password Storing

The definition mentioned before, lead to the development of secure password storage.

The use of one way hashing function is the following algorithm:

- For password storage, request input
`plain_password `

from user , then apply a oneway hash function H on `plain_password `

and store it. In code, `stored_password = H(plain_password)`

- For password checking, request
`probe_password `

from user, apply H on it, and compare with `stored_ password`

, i.e. check `::= H(probe_password) == stored_password`

.

This schema is good as long as H is a good and strong hashing function.

Today, MD5 is under heavy attack, the reason is that it is used on the GNU implementation of the POSIX function Crypt. In fact, on this Web site, you can find a lot of collisions for this function.

Other papers about the MD5 attacks can be found here.

## Other Security Side Effects

In recent works, three investigators: Lenstra, Wang and Weger, showed that it is feasible to build colliding electronic X.509 certificates, using the MD5 collision techniques developed by Wang. You can read their paper here.

This work violates the basic trust principles underlying PKI (Public Key Infrastructure).

Moreover, Chinese investigators have shown an attack to other Hash function: SHA-1. (Read about it here).

Some others side effects are mentioned in the paper "MD5 To Be Considered Harmful Someday".

For example, digital signatures like RSA, DSA/ElGamel, and Elliptical Curve never hash the data directly, but rather a hash of the data, often the choice is MD5. Also consider DRM (Digital Right Management) implementations using MD5. All these protection signatures and checksums are at risk because of these findings.

If you read the paper, you can learn that it is possible to add a payload to the data, or alter the data without being noticed.

Other example is shown in the paper, Practical Attacks on Digital Signatures Using MD5 Message Digest.

## A Proof of Concept

In this article, I wrote about how to implement the attack in Microsoft.NET.

## Finding Collisions for MD5

The typical way of collision search is to use a brute-force algorithm: given a hash value `h`

, for a plain message `m`

, written in an alphabet `A`

, then `h = MD5(m)`

, so in brute force collision search, we try every possible combination in alphabet `A`

we find a `m' `

message such as `MD5(m') = h. m' `

can be equal to or not equal to `m`

.

The Rainbow Crack uses precalculated tables for intermediate steps on the process, this can accelerate the cracking process. For example, a password of up to 14 characters, of this charset: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()-_+=" can be cracked in a few minutes.

## Better One Way Hash Functions

Some people suggest to use more complicated applications of MD5, for example, use `stored_password = crypt(plain_password + salt)`

, where the key is a fixed one or the user id, or some other fixed value. This schema is not stronger, and can be shown to have several flaws to this approach.

Other alternatives are:

- Use a key based hash function
- Combine algorithms
- Use other functions

The first one relays on a key, that can be stolen. Combining algorithms is better, but probably only results in more CPU usage than real protection.

### Alternatives Functions

To use other hash functions can be a solution, but what criteria is used to choose a good one?

The answer is simple, use hash functions with a bigger domain of results. For example, MD5 generates a 128 bits value, so the space of possible resulting values is 2^{128} in size. By simple logic, if your hash function has an output domain of a size bigger than that, then it's a good alternative.

All the followings functions have stronger implementations than MD5.

- WHIRPOOL, generates a 512 bits output
- RIPEMD, uses 160, 128 or 320 bits output
- SHA-2, generates 256, 512 bits output

SHA-2 is available on crypto API, and Microsoft.NET, so I suggest you to use it. The SHA-2 is a group of functions, in Microsoft.NET you have the followings classes:

`System.Security.Cryptography.SHA256Managed`

`System.Security.Cryptography.SHA384Managed`

`System.Security.Cryptography.SHA512Managed`

## Change Log

- September 7
^{th}, 2005: Added some quotes from Chinese security groups, and more links on papers about the MD5 collisions. Also a section about other effects on DRM and checksums is included. Some grammar corrections. - September 8
^{th}, 2005: Some title changes, to be more precise about cryptology terminologies. Some typos corrected. - September 9
^{th}, 2005: Added list of SHA-2 algorithms available on Microsoft.NET 1.1. - September 14
^{th}, 2005: Added link to proof of concept article