Click here to Skip to main content
15,166,980 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
If I have a salt of 16bytes and 16bytes of data, how fast can one find another 16bytes of data so that MD5(salt + data) == MD5(salt + other data)?

I don't expect an answer accurate to the nanosecond, just an estimation like "a few seconds", "a few hours", "a few civilizations".
It apparently depends on your system and code. Beyond that, the question simply makes no sense. You did not even specify your language and platform.
Why not timing it by yourself?
Sergejack 9-May-14 3:09am
System and language are irrelevant. Hackers will always use whatever they see fit. I just want to know how fast they could forge a fake data being validated by the same MD5 hash and salt.

As Sergay said, it depends on your system and code. For instance, I have Ronald Rivest's code, written in straight C, that hashes 1 block in X nanoseconds. Since I haven't done any benchmarks, I can't tell you the value of X. I used Microsoft Visual C++ 6.0 to compile it into the DLL that I use daily, and it runs against the multithreaded DLL implementation of the Standard C runtime library. I would expect to get marginally different (probably better) times if I compiled against the static multithreaded CRT, and the times would certainly be (slightly) better if I compiled against the statically linked single threaded CRT (LIBC.LIB). I might get measurably different results if I used the GCC compiler to compile and link the same code.

The above only covers the potential sources of variation given the same source code, written in the same programming language. It gets even more muddy if you include other implementations of the MD5 algorithm, either in the C programming language, or change the programming language entirely. For example, I would expect a Visual Basic implementation of the same algorithm to run more slowly, even if it was faithful to Dr. Rivest's algorithm. The same would be true of a port to C#, JavaScript, or any number of other languages in which the MD5 algorithm has been implemented.

A sample size of 16 bytes is pretty unrealistic. Real world plaintext is more like thousands, or even millions, of bytes, such as the body of a mail message or the executable code and static data of a program file.

Another consideration is that sample size has little bearing on the content of the resultant hash, since the MD5 hash is always 128 bits (32 bytes) long.

Moreover, if you confine your test cases to plaintexts of 16 bytes, it's fairly easy to work out that there are only 128 possible combinations (8 bits per byte times 16 bytes).
Sergejack 9-May-14 3:21am
If I ask "How fast can one man go from Japan to the US?", the right answer is "By plane". Asking back "What vehicle will he use?" is just denying the generic meaning of the question.
Matt T Heffron 12-May-14 14:06pm
"Moreover, if you confine your test cases to plaintexts of 16 bytes, it's fairly easy to work out that there are only 128 possible combinations (8 bits per byte times 16 bytes)."
It is the ~192 possible plaintext characters (in 8 bits unicode/ascii, ignoring "control chars") TO THE POWER OF 16. That's 3.4e36 combinations.
David A. Gray 4-Nov-14 3:33am
You are correct. Nevertheless, the point is that 16 bytes is 128 bits, a trivial case that is unlikely to yield a collision, and sufficiently small that it is computationally practical to take a brute force approach.
Answer : very slowly as it's not just about finding collision but working with preimage.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900