Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / C++
Article

Spoofing the Wily Zip CRC

Rate me:
Please Sign up or sign in to vote.
3.80/5 (5 votes)
31 May 2003Public Domain3 min read 76.6K   853   23   12
How Zip CRCs work and how to create a specific file CRC

Introduction

CRC's are designed for error detection and the are extremely good at that. One often sees warnings that CRC's should not be used for security purposes yet it is rarely shown just how incredibly easy it is to "spoof" a CRC. This short article will show how true that warning is in addition to providing a simplified algorithm for Zip file CRC calculations.

Background

CRC's are used for error detection and are optimal for this purpose. For instance the CRC in Zip will detect all single error bursts of up to any 32 consecutive bits no matter where they occur and the CRC itself is only 32 bits! Is that cool or what? This would seem to be an ideal candidate for a one way hash function but it's not. In fact, the well understood and regular math behind CRC's is what makes them so easy to spoof. CRC's are one component of a fascinating filed of mathematics called "Galois Field Theory", a cornerstone of error correcting codes used just about everywhere. This is the stuff that keeps your CD's playing true even when severely scratched. A great text for those wishing to delve deeply into the math is Peterson's classic "Error Correcting Codes" which will explain in math terms why the algorithms shown here work. Also, perhaps the great tutorial and in depth description of CRC's is Ross N. Williams' classic "crc_v3.txt" widely available on the net.

Here is the Zip CRC algorithm in un-obfuscated terms:

  1. A 32 bit value, the residual, is set 0xffffffff
  2. The data stream is viewed as a series of bits starting with the lsb of the first byte and ending with the msb of the last byte.
  3. Each bit is xor'ed with the lsb of the residual and if the result is one the generating (gf) value, 0xdb710641 is xor'ed into the residual.
  4. The residual is then rotated 1 bit to the right
  5. Steps 3 and 4 are repeated for each bit in the stream
  6. The ending crc is the bit inverted residual.

Finding the CRC of a Zipped File

Zip CRC's can be seen by using the verbose mode. With unzip.exe I use this command to view crc's:

unzip -lv zipfile

Which produces this listing from running the sample code included. (and zipping the result)

Archive: x.zip
Method Size Ratio Date Time CRC-32 Name
------ ------ ---- ----- ---- ---- ------ ----
41 Stored 41 0% 06-01-03 00:22 01234567 crc01234567.txt
------ ------ --- -------

The CrcZip Class

class CrcZip {
public:
    explicit CrcZip(UI32 a_r=~0) : r(a_r){}
    enum {gf=0xdb710641};  // This is the generator
    UI32 r;                // residual, polynomial mod gf
    void clk(int i=0);     // clock in data, bit at a time
    void update(char *pbuf, int len); // update crc residual 
    operator UI32(){return ~r;};// object yields current CRC
    void clkrev();        // clk residual backwards
};

// This reverses CrcZip::clk(0)
// in Galois terms: R := i + R*X
void CrcZip::clkrev()
{
    if (r & 0x80000000)
        r = (r << 1) ^ gf;
    else
        r = (r << 1);
}

// this is the basic "gut's" of crc calcuation.
// and updates the crc's 32 bit residual for a
// single bit, in terms of galois theory
// this is: R := R*X^(-1)
void CrcZip::clk(int i)
{
    r ^= i;
    // rotation combined with possible xor of "gf"
    if (1 & r)
        r = ((r^gf) >> 1) | 0x80000000;
    else
        r >>= 1;
}

// This does a byte at a time by calculating the
// crc bit by bit, lsb first. Slow, but clear.
void CrcZip::update(char *pbuf, int len)
{
    while (len--)
    {
        int bitcnt, byte = *pbuf++;
        for (bitcnt=0; bitcnt < 8; bitcnt++)
        {
            clk(byte & 1);
            byte >>= 1;
        }
    }
}

The Test Program

The sample code calculates, "fixes", and writes a file which when zipped, creates a CRC of 0x01234567. After you see how easy spoofing CRC's is you will never again use them to ID stamp virus detectors or other code designed as one way functions.

Here we calculate the initial CRC of a test string.

CrcZip crc;
char s[41]={"This is a 0 crc file\x1a" "xxxx" "arbitrary stuff"};
crc.update(s, sizeof(s));
cout << "The CRC for this string is " << hex << crc << endl;

Then we initialize a fixup crc with the desired value xor'ed into the orignal strings crc:

CrcZip crcFixup(crc^0x01234567);
              // 0x01234567 instead of
              // "0" makes it more interesting

Now we back up the fixup residual by the number of bits from the patch location to the file end and xor it with whatever is in the patch.

// Then we back step the residual to the point we want
// to stuff a 32 bit value. To interpret "(41-21)*8".
// 41 is the length of the string. 21 is the position
// of the 4 byte place we want to stuff
// a fixup value, 8 bits per byte
for (int i =0; i < (41-21)*8; i++)
    crcFixup.clkrev();
    
// since crc's are "linear" (super-position applies)
// we xor the fixup residual with the 4 bytes "xxxx"
// already there.
*(reinterpret_cast<UI32 *>(s+21)) ^= crcFixup.r;

Applications

There are several applications for this in hardware (and possibly software) design. Hardware design is even more notorious for boundary condition sensitivity than software. Error detection protocols should always be well tested. Such things as all zero and all ones as well as a single crc bits set with all other bits zero must be tested in hardware designs. This technique provides a methodology for easily creating these patterns. Brute force can also be used. Exhaustive searches are easy with 32 bit CRC's but not at all feasible with extensions such as 64 bits.

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication


Written By
Engineer
United States United States
Professional Engineer(EE), interests include DSP, Control Systems, Analog design, C++, C#, Color Science

Comments and Discussions

 
GeneralRe: mangled formatting Pin
mdgray1-Jun-03 18:10
mdgray1-Jun-03 18:10 
GeneralRe: mangled formatting Pin
Nish Nishant12-Sep-03 1:55
sitebuilderNish Nishant12-Sep-03 1:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.