Click here to Skip to main content
Click here to Skip to main content

Spoofing the Wily Zip CRC

By , 31 May 2003
 

Introduction

CRC's are designed for error detection and the are extremely good at that. One often sees warnings that CRC's should not be used for security purposes yet it is rarely shown just how incredibly easy it is to "spoof" a CRC. This short article will show how true that warning is in addition to providing a simplified algorithm for Zip file CRC calculations.

Background

CRC's are used for error detection and are optimal for this purpose. For instance the CRC in Zip will detect all single error bursts of up to any 32 consecutive bits no matter where they occur and the CRC itself is only 32 bits! Is that cool or what? This would seem to be an ideal candidate for a one way hash function but it's not. In fact, the well understood and regular math behind CRC's is what makes them so easy to spoof. CRC's are one component of a fascinating filed of mathematics called "Galois Field Theory", a cornerstone of error correcting codes used just about everywhere. This is the stuff that keeps your CD's playing true even when severely scratched. A great text for those wishing to delve deeply into the math is Peterson's classic "Error Correcting Codes" which will explain in math terms why the algorithms shown here work. Also, perhaps the great tutorial and in depth description of CRC's is Ross N. Williams' classic "crc_v3.txt" widely available on the net.

Here is the Zip CRC algorithm in un-obfuscated terms:

  1. A 32 bit value, the residual, is set 0xffffffff
  2. The data stream is viewed as a series of bits starting with the lsb of the first byte and ending with the msb of the last byte.
  3. Each bit is xor'ed with the lsb of the residual and if the result is one the generating (gf) value, 0xdb710641 is xor'ed into the residual.
  4. The residual is then rotated 1 bit to the right
  5. Steps 3 and 4 are repeated for each bit in the stream
  6. The ending crc is the bit inverted residual.

Finding the CRC of a Zipped File

Zip CRC's can be seen by using the verbose mode. With unzip.exe I use this command to view crc's:

unzip -lv zipfile

Which produces this listing from running the sample code included. (and zipping the result)

Archive: x.zip
Method Size Ratio Date Time CRC-32 Name
------ ------ ---- ----- ---- ---- ------ ----
41 Stored 41 0% 06-01-03 00:22 01234567 crc01234567.txt
------ ------ --- -------

The CrcZip Class

class CrcZip {
public:
    explicit CrcZip(UI32 a_r=~0) : r(a_r){}
    enum {gf=0xdb710641};  // This is the generator
    UI32 r;                // residual, polynomial mod gf
    void clk(int i=0);     // clock in data, bit at a time
    void update(char *pbuf, int len); // update crc residual 
    operator UI32(){return ~r;};// object yields current CRC
    void clkrev();        // clk residual backwards
};

// This reverses CrcZip::clk(0)
// in Galois terms: R := i + R*X
void CrcZip::clkrev()
{
    if (r & 0x80000000)
        r = (r << 1) ^ gf;
    else
        r = (r << 1);
}

// this is the basic "gut's" of crc calcuation.
// and updates the crc's 32 bit residual for a
// single bit, in terms of galois theory
// this is: R := R*X^(-1)
void CrcZip::clk(int i)
{
    r ^= i;
    // rotation combined with possible xor of "gf"
    if (1 & r)
        r = ((r^gf) >> 1) | 0x80000000;
    else
        r >>= 1;
}

// This does a byte at a time by calculating the
// crc bit by bit, lsb first. Slow, but clear.
void CrcZip::update(char *pbuf, int len)
{
    while (len--)
    {
        int bitcnt, byte = *pbuf++;
        for (bitcnt=0; bitcnt < 8; bitcnt++)
        {
            clk(byte & 1);
            byte >>= 1;
        }
    }
}

The Test Program

The sample code calculates, "fixes", and writes a file which when zipped, creates a CRC of 0x01234567. After you see how easy spoofing CRC's is you will never again use them to ID stamp virus detectors or other code designed as one way functions.

Here we calculate the initial CRC of a test string.

    CrcZip crc;
    char s[41]={"This is a 0 crc file\x1a" "xxxx" "arbitrary stuff"};
    crc.update(s, sizeof(s));
    cout << "The CRC for this string is " << hex << crc << endl;

Then we initialize a fixup crc with the desired value xor'ed into the orignal strings crc:

   CrcZip crcFixup(crc^0x01234567);  
                 // 0x01234567 instead of
                 // "0" makes it more interesting

Now we back up the fixup residual by the number of bits from the patch location to the file end and xor it with whatever is in the patch.

// Then we back step the residual to the point we want
// to stuff a 32 bit value. To interpret "(41-21)*8".
// 41 is the length of the string. 21 is the position
// of the 4 byte place we want to stuff
// a fixup value, 8 bits per byte
for (int i =0; i < (41-21)*8; i++)
    crcFixup.clkrev();
    
// since crc's are "linear" (super-position applies)
// we xor the fixup residual with the 4 bytes "xxxx"
// already there.
*(reinterpret_cast<UI32 *>(s+21)) ^= crcFixup.r;

Applications

There are several applications for this in hardware (and possibly software) design. Hardware design is even more notorious for boundary condition sensitivity than software. Error detection protocols should always be well tested. Such things as all zero and all ones as well as a single crc bits set with all other bits zero must be tested in hardware designs. This technique provides a methodology for easily creating these patterns. Brute force can also be used. Exhaustive searches are easy with 32 bit CRC's but not at all feasible with extensions such as 64 bits.

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

About the Author

mdgray
Engineer
United States United States
Member
Professional Engineer(EE), interests include DSP, Control Systems, Analog design, C++, C#, Color Science

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralThanksmemberthecodeproject_1 Sep '10 - 3:11 
Thanks for the code.
 
I was able to modify this to accept 2 parameters, a filename and a checksum value. With those, I can brute force any checksum value on any file.
 
I haven't used C++ since school. So, it's not my best language. Consequently, I still don't have a full understanding of all of the syntax used in the crc class. But, since it's sufficiently abstracted, I didn't have to. But, I will be making an effort to better understand it.
 
There is a good articles on reverse engineering a CRC here:
Reverse-Engineering a CRC Algorithm
and a fairly easy to follow explanation of the CRC algorithms here:
A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
QuestionBug in clkrev?memberFozi_24 Oct '08 - 4:40 
I don't see how your clkrev could work without using the original data as a reference.
 
When you go forwards, you shift out a bit, let's call it R, and compare it to the data bit D. If they are different, you xor the polynomial to your r. When you reverse this, you know that R and D were different, but you will have to shift R back in. Easiest way would be using D, since R is !D when you xored, or D when you did not. But, you're not.
 
Or am I wrong?
 
Karl
AnswerRe: Bug in clkrev?membermdgray19 Nov '08 - 9:42 
Karl,
 
In calculating a crc you include the data. However, including the data in going reverse would produce the wrong crc. The goal is to patch a "vacant" location in the string with whatever is required to create a known crc. The reason data is not used in reversing the crc is that crc's are "linear" and superposition applies thus the crc fixup does not involve any of the string's data, ether before or after the patch area. For instance, were the string to start off entirely zeros the process of creating a known crc would work exactly the same and produce the same value that is exor'ed into the string of zeros.
GeneralRe: Bug in clkrev?memberFozi_21 Nov '08 - 22:36 
Ah, so then the clkrev function does not compute the crcs, but a value that can be used to modify the resulting crc?
 
BTW, I ended up implementing the table lookup version from this document[^]
 
Fozi
GeneralRe: Bug in clkrev?membermdgray22 Nov '08 - 11:40 
Hi!,
 
Exactly. There's a lot of good articles on CRC generation and table lookup approaches offer the most efficient approaches. This article's purpose is to demonstrate the linearity of CRCs and why they should not be used for ID'ing an object since it is easy to construct a message with a known CRC. They are excellent for error detection but easily spoofed so shouldn't be used for security purposes.
General*cringe* --&gt; &quot;gut's&quot;memberkrick9 Dec '05 - 7:59 
I guess when you go to school for Engineering, they don't make you take any English classes. Smile | :)
Questionoptimized version?memberNimai22 Aug '05 - 11:06 
Thank you for the informative tutorial.
I was wondering if you or anyone else already had an optimized version of this code that does calculate one byte at a time?
Thanks.
Generaldetermining where to byte stuffmemberjy200321 Aug '04 - 6:57 
I don't understand the part of:
 
// Then we back step the residual to the point we want
// to stuff a 32 bit value. To interpret "(41-21)*8".
// 41 is the length of the string. 21 is the position
// of the 4 byte place we want to stuff
// a fixup value, 8 bits per byte
for (int i =0; i < (41-21)*8; i++)
crcFixup.clkrev();

// since crc's are "linear" (super-position applies)
// we xor the fixup residual with the 4 bytes "xxxx"
// already there.
*(reinterpret_cast(s+21)) ^= crcFixup.r;
 
How did you come up with 21? Is it random?
GeneralRe: determining where to byte stuffmembermdgray8 Sep '04 - 8:23 
hi jv2003,
 
21 is just the byte offset of the sequence "xxxx" in the string which is where I chose to stuff the 4 bytes required to force the crc to a specific value. It was to demonstrate that a crc can be set at any desired value simply by altering any four consecutive bytes at an arbitrary point in the string.
 
marty
Generalmangled formattingmembermdgray1 Jun '03 - 8:30 
Sorry for the mangled HTML formatting. -Marty
GeneralRe: mangled formattingmembermdgray1 Jun '03 - 18:10 
If anyone knows why the code doesn't get color coded correctly please help. TIA. I made some minor changes to the article to remove some stupidities and add a short application section. Since I am a hardware designer by background, the applications are obvious in that realm but does anyone know any software apps other than just making clear why one should use an MD5 or other crypto level, one way function, when trying to create message digests.
 
-Marty
GeneralRe: mangled formattingeditorNishant S12 Sep '03 - 1:55 
Hi Marty
 
The culprit was a malformed FONT tag. I've fixed it now Smile | :)
 
Nish
 

Extending MFC Applications with the .NET Framework [NW] (coming soon...)
Summer Love and Some more Cricket [NW] (My first novel)
Shog's review of SLASMC [NW]
Come with me if you want to live

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 1 Jun 2003
Article Copyright 2003 by mdgray
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid