Click here to Skip to main content
Click here to Skip to main content

CRC32: Generating a checksum for a file

By , 17 Dec 2001
 
<!-- Download Links --> <!-- Article image -->

Sample Image - crc32.gif

Introduction

Recently I wrote a program in which I wanted to generate a CRC for a given file. I did some checking on the web for sample CRC code, but found very few algorithms to help me. So I decided to learn more about CRCs and write my own code. This article describes what a CRC is, how to generate them, what they can be used for, and lastly source code showing how it's done.

What is a CRC

CRC is an acronym for Cyclic Redundancy Checksum or Cyclic Redundancy Check (depending on who you ask). A CRC is a "digital signature" representing data. The most common CRC is CRC32, in which the "digital signature" is a 32-bit number. The "data" that is being CRC'ed can be any data of any length; from a file, to a string, or even a block of memory. As long as the data can be represented as a series of bytes, it can be CRC'ed. There is no single CRC algorithm, there can be as many algorithms as there are programmers. The ideal CRC algorithm has several characteristics about it. First, if you CRC the same data more than once, you must get the same CRC every time. Secondly, if you CRC two different pieces of data you should get two very different CRC values. If you CRC the same data twice, you get the same digital signature. But if you CRC data that differs (even by a single byte) then you should get two very different digital signatures. With a 32-bit CRC there are over 4 billion possible CRC values. To be exact that's 232 or 4,294,967,296. With that many CRC values it's not difficult for every piece of data being CRC'ed to get a unique CRC value. However, it is possible for spurious hits to happen. In other words two completely different pieces of data can have the same CRC. This is rare, but not so rare that it won't happen.

Why use CRCs

Most of the time CRCs are used to compare data as an integrity check. Suppose there are two files that need to be compared to determine if they are identical. The first file is on Machine A and the other file is on Machine B. Each file is a rather large file (say 500 MB), and there is no network connection between the two machines. How do you compare the two files? The answer is CRC. You CRC each of the two files, which gives you two 32-bit numbers. You then compare those 32-bit numbers to see if they are identical. If the CRC values are different, then you can be 100% guaranteed that the files are not the same. If the CRC values are the same, then you can be 99% sure that the files are the same. Remember, because spurious hits can happen you cannot be positive that the two files are identical. The only way to be positive they are the same is to break down and do a comparison one byte at a time. But CRCs offer a quick way to be reasonably certain that two files are identical.

How to generate CRCs

Generating CRCs is a lot like cryptography in that involves a lot of mathematical theories. Since I don't fully understand it myself, I won't go into a lot of those details here. Instead I'll focus on how to program a CRC algorithm. Once you know how the algorithm works you should be able to write a CRC algorithm in any language on any platform. The first part of generating CRCs is the CRC lookup table. In CRC32 this is a table of 256 specific CRC numbers. These numbers are generated by a polynomial (the computation of these numbers and what polynomial to use are part of that math stuff I'm avoiding). The next part is a CRC lookup function. This function takes two things, a single byte of data to be CRC'ed and the current CRC value. It does a lookup in the CRC table according to the byte provided, and then does some math to apply that lookup value to the given CRC value resulting in a new CRC value. The last piece needed is the actual data that is to be CRC'ed. The CRC algorithm reads the first byte of data and calls the CRC lookup function which returns the CRC value for that single byte. It then calls the CRC lookup function with the next byte of data and passes the previous CRC value. After the second call, the CRC value represents the CRC of the first two bytes. You continuously call the CRC lookup function until all the bytes of the data have been processed. The resulting value is the CRC for the whole data.

Code Details

In this sample program I wanted to show that there are many different ways of generating CRCs. There are over 8 different CRC functions, all based on the above steps for generating CRCs. Each function differs slightly in it's intended use or optimization. There are four main CRC functions, each described below. There are also two separate CRC classes, but more on that later. And lastly there are a few helper functions that CRC strings.

C++ Streams: The first function represents the simplest CRC function. The file is opened using the C++ stream classes (ifstream). This function uses nothing but standard C++ calls, so this function should compile and run using any C++ compiler on any OS.

Win32 I/O: This function is more optimized in that it uses the Win32 API for file I/O; CreateFile, and ReadFile. This will speed up the processing, but by using the Win32 API the code is no longer platform independent.

Filemaps: This function uses memory mapped files to process the file. Filemaps can be used to greatly increase the speed with which files are accessed. They allow the contents of a file to be accessed as if it were in memory. No longer does the programmer need to call ReadFile and WriteFile.

Assembly: The final CRC function is one that is optimized using Intel Assembly. By hand writing the assembly code the algorithm can be optimized for speed, although at the sacrifice of being easy to read and understand.

Those are the four main CRC functions. But there are actually two versions of each function. There are two classes, CCrc32Dynamic and CCrc32Static, each of which have the above four functions for a total of eight. The only difference between the static and dynamic classes is the CRC table. With the static class the CRC table and all the functions in the class are static. The trade off is simple. The static class is simpler to use, but the dynamic class uses memory more efficiently because the CRC table (1K in size) is only allocated when needed.

// Using the static class is as easy as one line of code
dwErrorCode = CCrc32Static::FileCrc32Assembly(m_strFilename, dwCrc32);

// Whereas there is more involved when using the dynamic class
CCrc32Dynamic *pobCrc32Dynamic = new CCrc32Dynamic;
pobCrc32Dynamic->Init();
dwErrorCode = pobCrc32Dynamic->FileCrc32Assembly(m_strFilename, dwCrc32);
pobCrc32Dynamic->Free();
delete pobCrc32Dynamic;

Whenever you calculate a CRC you need to take into account the speed of the algorithm. Generating CRCs for files is both a CPU and a disk intensive task. Here is a table showing the time it took to CRC three different files. The columns are the different file sizes, the rows are the different CRC functions, and the table entries are in seconds. The system these numbers were captured on is a dual Pentium III at 1 GHz with a 10,000 RPM SCSI Ultra160 hard drive.

44 Kb 34 Mb 5 Gb
C++ Streams 0.0013 0.80 125
Win32 I/O 0.0009 0.60 85
Filemaps 0.0010 0.60 87
Assembly 0.0006 0.35 49

As expected the C++ streams is the slowest function followed by the Win32 I/O. However, I was very surprised to see the filemaps were not faster than the Win32 I/O, in fact they are slower. After I thought about it some, I realized memory mapped files are designed to provide fast random access to files. But when you CRC you access the file sequentially. Thus filemaps are not faster, and the extra overhead of creating the "views" of the file are why it's slower. Filemaps do have one advantage that none of the other functions have. Memory mapped files are guaranteed to be able to access files up to the maximum file size in NT which is 264 or 18 exabytes. Although the Win32 I/O may handle files of this size, none of the documentation confirms this. [Note: The largest file I have CRC'ed is 40 GB, which all eight functions successfully CRC'ed, but took over 10 minutes each.]

If anyone who reads this article knows a way to improve the speed even more, please post the code or email me. Especially if you know of a speed improvement for the assembly code. I will bet there are further optimizations that can be made to the assembly code. After all I don't know Intel Assembly very well, therefore I'm sure there are optimizations I don't know about.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Brian Friesen
Web Developer
United States United States
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Suggestion64bit compilermemberJan Stetka7-May-13 1:57 
Note: the 64bit compiler in Visual Studio doesn't like the inline assembley. So you need to #define it out depending on what you're targeting.
QuestionCRC for multiple filesmemberrk1960in10-Oct-12 6:57 
How can a CRC be calculated for multiple files?
I get the CRC for each file recursively in a folder.
Convert the CRC to Integer and then add all the CRC values.
Then convert the added value to hex.
This gives a additive CRC and if there are a lot of files the number gets bigger than 32 bits.
Is there another approach to display one CRC values for all the files.
Using C#.
thanks!
AnswerRe: CRC for multiple filesmemberBrian Friesen10-Oct-12 7:06 
Adding CRC values may work, but my guess is you open yourself up to more false positives. A better approach would be to CRC all the files as one big block of data. There are 3 steps to CRC32. Step 1 - initialize the CRC value (dwCrc32 = 0xFFFFFFFF;). Step 2 - CRC the data (CalcCrc32 function in my code). Step 3 - invert the CRC value (dwCrc32 = ~dwCrc32;). To CRC multiple files you'd do step 1 once, then repeat step 2 until all files have been processed, and then perform step 3. Note, the order you process the files is important!
 
CRC32 is older and has long since been surpassed by newer and better algorithms. You might want to look at MD5 or better yet SHA-1. Good luck.
QuestionLicensememberMember 896106528-May-12 23:08 
The article does not include any specified license, so, can you tell what license should apply?
Is it The Code Project Open License (CPOL)?
 
Thanks
Cornelia
GeneralThank you very much!memberyulin111-Oct-08 22:59 
Hi,I want to say that I am looking for a crc source code like this.
 
I found a lot of them, but this one I think is the best.
 
So, thank you very much for offering such good source code.
 

 
codeuu,source code
QuestionHow many errors..memberRikard Astrof24-Feb-08 23:01 
How many bit errors can a 32-bit CRC detect before it can't properly detect anymore?
is it 32?
AnswerRe: How many errors..membersupercat914-Apr-08 8:40 
With an 'n' bit CRC, any change that is confined to a single n-bit region of a file will be detected. Most common CRCs will also detect any combination of errors that flip an odd number of bits, and any two-bit error where the bits are not separated by a multiple of 2^(n-1) bits.
 
Although CRC's are excellent for detecting bit errors that fit the above patterns, inputs with an even number of bit errors >= 4, where the errors are not confined to an 'n' bit region may often go undetected.
QuestionLicensingmemberallen_ellison15-Jan-08 16:33 
Is this code public domain?
GeneralComputing File CRC in VBScriptmemberMilind Mehendale25-Nov-07 19:06 
Hello,
 
Can I compute file CRC using VBScript? If yes, how??
 
Regards,
Milind
QuestionHow to compile?membermaxsubzero30-Oct-06 13:51 
Hi, I'm new in this forum and I'm interesting in crc, I would like to learn about it?, I have done a little code (just C) for small files (<1mb)...
 
I want to compile (and execute) the code but I don't know how I can do this.. may you help me?
 
I use Mingw32 (gcc) , but if i need another compiler i can get it, just say it.
 
thanks
AnswerRe: How to compile?memberVEMS9-Jan-07 9:36 
You can get VS 2005 Express for free at
 
http://msdn.microsoft.com/vstudio/express/[^]
GeneralRe: How to compile?membermaxsubzero20-Jan-07 11:05 
Thanks!! Smile | :)
 
I'll get it now.
GeneralRe: How to compile?memberchris_liush22-Oct-07 19:33 
Error 2 fatal error C1083: Cannot open include file: 'fstream.h': No such file or directory c:\documents and settings\chris.liu\桌面\crc_file\crc32_src\crc32\crc32dynamic.cpp 3
 

can not open fstream.h
my IDE is vs2005
AnswerRe: How to compile?memberispeedonthe40516-Nov-07 13:50 
Under the current version you'll have to make some code changes to get it to compile.
 
1) Change fstream.h to fstream
 
2) You'll need to specify the std namespace, either with a using namespace std; in the file or prefixing with std::
 
3) ios::nocreate no longer exists. You can change it to ios::_Nocreate but I don't believe that's portable (vendor-specific). If you're only building on Windows then that's fine.
 
4) Get rid of the filebuf::sh_read parameter in the open() calls. Assuming #3 above, it should now look like this:
file.open(szFilename, ios::in | ios::_Nocreate | ios::binary);
GeneralRe: How to compile?membertianjianii26-Jun-08 23:29 
ios::in will not create and ios::in|ios:ut will not create anyway, so just remove ios::nocreate.
GeneralRe: How to compile?memberTim Stubbs6-Aug-09 3:33 
thanks for taking the time Smile | :)
 
Tim Stubbs

Questionlink errormemberJakeFront19-Sep-06 6:31 
Hi
 
I have used some of your code in my project, namely the CCrc32Static.
 
I only want this function, the others i have commented out as i will not need them. I have hard coded a test file in the main but when i build i get the following error.
 
Crc error LNK2019: unresolved external symbol "public: static void __cdecl CCrc32Static::CalcCrc32(unsigned char,unsigned long &)" (?CalcCrc32@CCrc32Static@@SAXEAAK@Z) referenced in function _main
 
I have seen the ASSERT thread below but i don't fully understand where to put that in my code.
 
I have the code below.
 
// Crc.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "crc32Static.h"
#include
#include
#include
#include
 
using namespace std;
 
int main ()
{
 
DWORD dwCrc32, dwErrorCode = NO_ERROR;
(m_strFilename, dwCrc32);
 
ifstream fin;
string string1 = "", string2, string3;
fin.open("C:\\test\\xml files\\2605",ios::in);
 
if (fin.is_open())
{
fin >> string1;
while (string1 != "")
{
fin >> string1;
}
 
char buffer[MAX_BUFFER_SIZE];
int nLoop, nCount;
nCount = fin.read(buffer, sizeof(buffer)).gcount();
while(nCount)
{
for(nLoop = 0; nLoop < nCount; nLoop++)
//CalcCrc32(buffer[nLoop], dwCrc32);
CCrc32Static::CalcCrc32(buffer[nLoop], dwCrc32);
nCount = fin.read(buffer, sizeof(buffer)).gcount();
}
}
}
 

with the helper class as follows
DWORD CCrc32Static::s_arrdwCrc32Table[256] =
{
//static table
};
 
//***********************************************
CCrc32Static::CCrc32Static()
{
}
 
//***********************************************
CCrc32Static::~CCrc32Static()
{
}
 
//***********************************************
inline void CCrc32Static::CalcCrc32(const BYTE byte, DWORD &dwCrc32)
{
dwCrc32 = ((dwCrc32) >> 8) ^ s_arrdwCrc32Table[(byte) ^ ((dwCrc32) & 0x000000FF)];
}

 

I have included the common file.
Thanks for your help in Advance.

 
Rick

AnswerRe: link errormemberBrian Friesen19-Sep-06 12:44 
I'm not sure. What changes, if any, did you make to Crc32Static.h?

GeneralRe: link errormemberJakeFront19-Sep-06 21:58 
Yes i made the following changes in Crc32staic.h file
#ifndef _CRC32STATIC_H_
#define _CRC32STATIC_H_
 
#include "Common.h"
 
class CCrc32Static
{
public:
CCrc32Static();
virtual ~CCrc32Static();
 
static inline void CalcCrc32(const BYTE byte, DWORD &dwCrc32);
 
static DWORD s_arrdwCrc32Table[256];
 
protected:
 
};
 
#endif

 
and i am using VS2003
 
Rick

GeneralRe: link errormemberaldasp28-Nov-06 4:15 
You must delete from function definition inline directive
 
---
Aldas
GeneralRe: link errormemberJakeFront28-Nov-06 5:02 
Thanks for your help
 
I found the solution to be to re order the header files in the stdafx files and the link error disappears
#include
#include // MFC core and standard components
#include // MFC extensions
#include // MFC support for Internet Explorer 4 Common Controls
#ifndef _AFX_NO_AFXCMN_SUPPORT
#include // MFC support for Windows Common Controls
#endif // _AFX_NO_AFXCMN_SUPPORT
#include

 
Rick

QuestionShould I Change CRC Table?memberOffLineR31-Jan-06 21:26 
Dear Brian,
 
Should I change the values in the CRC Table. I am planning to use CRC for security reasons and I would like to have my unique results when I use CRC algorithm.
 
Thanks a lot.
 
Cheers
 
Koray Smile | :)
AnswerRe: Should I Change CRC Table?memberaldasp28-Nov-06 20:58 
For security traditionally used md5 algorithm
it gives longer result, but it is no easy way
to make falsification
 
--
Aldas
GeneralThanksmemberOffLineR31-Jan-06 21:20 
Thanks a lot for this article.
 
It will be very helpful.
 
Cheers.
 
Koray Smile | :)
GeneralDifferent CRCmemberSniper16722-Jan-06 10:56 
I have another program that detects CRC's too but when I open the same file with both your program and the other one it picks up two different CRC's. Why?
 
Sniper167
GeneralRe: Different CRCmemberBrian Friesen22-Jan-06 12:09 
There are many different CRC algorithms, just as there are many different compression algorithms. The algorithm my program uses is the same that PKWare and Ethernet use. It sounds like this other CRC program you're using generates CRCs using a different algorithm, hence the different result.
GeneralRe: Different CRCmemberVEMS9-Jan-07 9:49 
You shouldn't. If you are, pretty useless across networks and file transfer. CRC32 in control by RFC 3309 (RFC3309)
GeneralRe: Different CRCmemberawneil14-Jul-08 3:30 
Brian Friesen wrote:
There are many different CRC algorithms, just as there are many different compression algorithms.

 
This is, of course, very true - which makes it a bit naughty of you not to have mentioned anywhere in the article or the source which particular algoritm you were implementing!
Generalthis is not the way to use CRCmemberjt_roxas10-Dec-05 14:30 
Firstly I commend you for your efforts in posting your code here..
 
I think you missunderstood the usage of CRC. CRC is not supposed to be used as checksum for large files. the reason for this is that you are representing say a 40GB file(based from your sample) with only only a 32bit value which only has 2^32 combinations as opposed to a a file with 2^(40GB*8bits) combination..
this will mean that there will be ( 2^(40GB*8bits) divided by 2^32bits )2^319999999968 files of the same 40GB size having the same CRC value at the most. Ofcors you can modify 1byte or a char at a tyme and it will generate a diferent CRC but try doing it at several hundred location and Iterate you will see that there will be many(billions) times that your code generates no error even when files are completely different....
 
You see CRC where created mostly with transmission protocols which sends data in packets that are only a few BYTES in length i.e. like in ISDN which has a default packet size of 2048bytes (including the CRC16 value whatever your modem use). the modem calculates the CRC for this 2048bytes and sent it with the packet so the receiving end can check for data corruption to resent the packet.
 
In Ethernet the packet size are much larger so instead of using a 16bit CRC value they use a larger number CRC32(32bit) to represent a larger packet size so the probability of cathing an error is not reduced compared to a modem with a smaller packetsize..
 
when a file is sent through a modem the protocol or a hardware in the modem segments the file into packets and compute the CRC for each packet and not for the whole file.
 
I hope this Clears up things....
 
thanks,
GeneralRe: this is not the way to use CRCmemberhagaylevy11-Jul-09 4:04 
good point...
GeneralRe: this is not the way to use CRCmemberJan Stetka4-Sep-10 5:00 
I can see what your saying but I've been using this code to crc executables and I've not had it generate the same CRC for different files once.
GeneralThe initial value of CRC32membersfchiou25-Oct-05 17:07 
Hi, Sir
 
Thank's for your introduction and code, it does make great help to the junior about CRC field.
 
I have questions about the code, would you kindly answer them? Thank you very much. ^_^
 
1.Why you set the initial CRC32 value to be 0xFF FF FF FF? Could I use another numbers, such as 0x00 00 00 00 or else, to be the initial value?
 
2.After completing CRC32 calculation(doing a lot of XOR and shift)? I found that you invert the value which gets from CRC32 calculation and use this inverted value as the result of CRC32. For what reason?
 
Would you enlight the junior about CRC, thank you very much. Smile | :)
 
GOD Bless you
sfchiou
GeneralCCrc32Dynamic::Free(void)memberally_s6-Sep-05 3:40 
Just a quick bug that I think I have found in an otherwise excellent article.
Surely in CCrc32Dynamic::Free(void) 'delete[]' should be used instead of 'delete' to delete the crc lookup table.
GeneralRe: CCrc32Dynamic::Free(void)memberBrian Friesen6-Sep-05 5:18 
Someone else emailed me about this several years ago. I *thought* I submitted updated source code, but apparently I didn't. Technically speaking, this should be innocuous. When deleting simple types like an array of DWORDs, delete and delete[] should do the same thing. The "[]" are only necessary for complex types like classes and structs. Still, I like to add the "[]" for consistency.

Generalboost::crcsussOliver M.26-Jul-05 20:02 
For CRC calculation, have a look at boost::crc
http://www.boost.org/libs/crc/index.html
GeneralDoesn't work for large filessussAnonymous1-Apr-05 8:41 
CRC32 is not adequate for extremely large files, like the 5 GB file that was used in the example. It will create collisions which means that the file could get corrupt and CRC32 wouldn't pick up the fact that it is corrupt, pretty much making the CRC useless.
GeneralRe: Doesn't work for large filesmemberBrian Friesen1-Apr-05 11:13 
Of course it works on large files. Your point is that there is a greater chance for spurious hits. This is true, but still the chances are pretty remote. The whole design of CRC is such the even the smallest change generates a completely different CRC. So in theory a large file would have to undergo a lot of changes before a spurious hit would occur. If you're that paranoid about it, then you can use something like MD5. But regardless of the file size or checksum algorithm, the only way to be 100% sure is a byte for byte comparison.

GeneralLink Error &quot;undefined symbol __CrtDbgReport&quot;memberkpatel10814-Feb-05 14:04 
Running Visual Studio 2003.net I am getting the following link error when I try to use Crc32Static:
 
error LNK2019: unresolved external symbol __CrtDbgReport referenced in function "protected: static bool __cdecl CCrc32Static::GetFileSizeQW(void * const,__int64 &)" (?GetFileSizeQW@CCrc32Static@@KA_NQAXAA_J@Z)
 
I have tracked _CtrDbgReport down to being called in the handling of _ASSERTE.
 
Has anybody fixed this problem?
 
Thank you for your help,
Kandarp

GeneralRe: Link Error &quot;undefined symbol __CrtDbgReport&quot;memberBrian Friesen14-Feb-05 15:38 
You could try using one of the other assert macros (ASSERT or _ASSERT). You could also try adding "#include " in the stdafx.h file.

GeneralRe: Link Error &quot;undefined symbol __CrtDbgReport&quot;memberkpatel10814-Feb-05 17:45 
Thank you. I switched to "assert".
 
Kandarp
Questionhow to compare folders by using CRCmembergreenjade80023-Aug-04 11:28 
I am trying to find a C++ source code to compare folders by using CRC. If you know where, please post here. Thanks in advance!

GeneralCRC used with serializationmembertime_error11-Dec-03 20:58 
Hi.
I am pretty familiar with the concepts of CRC.
 
I wonder, how do I use it when serializing data to file (I wan´t to be able to validate data in file afterwards when reading it back in to my application).
 
In the serialisation process data is "streamed" to file. First after the file is generated, it would make sence to to the actual CRC check of the file. Another issue is if this does destroy the possibility to use serialization when reading the file back (at leasy some execptions will be thrown).
 
So my question is:
Is there a standard method to use CRC when serializing, so data in the file can be validated afterwards (fx. before reading the file back).
 

 
(ORG. FILE)               -->      (ADD CRC)                  -->   (VALIDATE FILE LATER ON)
|-----------------|               |-----------------|            |-----------------|
|-----------------|               |-----------------|            |-----------------|   CRC == FILECHECK ???
|-----------------|               |-----------------|            |-----------------|
|-----------------|               |-----------------|            |-----------------|
                                                   |-CRC-|
                                                         ^
                                                         |-- This CRC is based on the org. file
 
/Jonas

QuestionCRC32's Question?memberdavid75329-Aug-03 23:35 
I check several books that mentioned about CRC principle.
The basic theory is like as below description in coding.
====================================
for(index = 0; index <= x; index++)
{
crc = crc & 0x80000000 ? (crc << 1) ^ 0x04C11DB7 : crc << 1;
}
=====================
But, In the most of CRC32's implementation.
It used a look up table.
And both results are difference.
Why??
I am curious in CRC32's implementation method and theory.
Who is the inventor?
Why the result doesn't correspond to the basic theory?
Where can I get any related paper?
Your kind reply will be highly appreciated.
 

 

 

QuestionGetting consistent CRCs??memberJohnDoeNumber230-Jul-03 7:53 
First, let me say this article was a big help. It had just the right amount of info for a person who knows very little about CRCs and simply needs the basics. However, I have a question/problem. I am working on a project that I need to have the same CRC each time it is compiled. I compile my project under VisualStudio 6.0 on WinNT and generate a CRC on the executable I just created. Then, I rebuild the entire project (using the exact same files and compiler) and do a CRC on the new executable. The two CRCs are different, even though the exes are the same. I have come to the conclusion that there is some sort of timestamp telling when the file was created. Does anyone know of an algorithm that compensates for the timestamp and ignores it? Does that make sense? If not, ask and I can clarify.
Thanks
AnswerRe: Getting consistent CRCs??memberBrian Friesen30-Jul-03 9:47 
How do you know the two files are the same? Did you do a bit-wise comparision (e.g. "fc /b"). Even though the source code didn't change, everytime a project is compiled, Visual Studio does not generate the exact same binary. You'd think it would, but it doesn't. I'm not sure why (maybe someone else knows), there might be an internally compiled timestamp which would explain the difference. But my program does not use the file system times. That's most likely why you're getting different CRCs, because the files are indeed different.
 
That having been said, I think there is a flaw in the way you're trying to use CRCs. Are you saying if you were to make actual changes to the code you still want to get back the same CRC? To try and force one file to have to same CRC as another file is a very difficult task, nearly impossible. The whole idea behind CRCs is the slightest difference in a file results in a significant change in the CRC value.

GeneralRe: Getting consistent CRCs??memberJohnDoeNumber230-Jul-03 10:25 
I opened the files in a Hex Editor, looking for differences, and noticed that there were 3 locations that a pair of entries were different. The locations of the differences are always the same, so I'm fairly sure that there is some sort of timestamp applied to the file. However, I haven't found if there is a way to control this from Visual Studio and was hoping someone might know where I could find information about this. The reason I need the CRC is to prove to certain authorities that I can recompile exactly the same executable now and six months down the road, with the exact same files (We're under verions control management).
GeneralRe: Getting consistent CRCs??memberWanderley Caloni25-May-05 8:01 
Take a look in the appendix of the PE Specification http://www.cs.ucsb.edu/~nomed/docs/pecoff.html#_Toc83091247.
 
--------------------
Wanderley Caloni Jr.
 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/IT d++(--) s+: a- C++ L E- W++ K- w++ PS
PE Y+ PGP+ t+ X+ R tv b+ DI++ D+ G e h- r y+
------END GEEK CODE BLOCK------
AnswerRe: Getting consistent CRCs??memberHPSI3-Aug-03 0:05 
You are correct in your conclusion. The PE file format contains multiple timestamps. The resources are also timestamped. To "prove" that two exe's are identical, you will have to parse the PE file format, crc'ing only the code sections.
 
Have you considered crc'ing the source code instead?
 

HPS HwndSpy
- GUI developer's aid to visually
locate and inspect windows. For the month of August
only, use coupon code CP-81239 for 30% off.

GeneralRe: Getting consistent CRCs??memberJohn Simmons / outlaw programmer31-Oct-03 2:59 
Does your hwndspy app work on full-screen direct3d apps?

 
------- signature starts
 
"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001
 
"You won't like me when I'm angry..." - Dr. Bruce Banner
 
Please review the Legal Disclaimer in my bio.
 
------- signature ends
AnswerRe: Getting consistent CRCs??memberDoug Gale9-Aug-04 4:38 
Yes, it is in the IMAGE_FILE_HEADER.
 
Look at your WinNT.h file and file IMAGE_FILE_HEADER. Look at the TimeDateStamp field.
 
The timestamp is completely ignored by windows and it is safe to null it out.
 
To find the file header in your exe, read the IMAGE_DOS_HEADER located at the beginning of the file.
 
The e_lfanew tells you the file offset of the IMAGE_FILE_HEADER. Seek there and read the IMAGE_FILE_HEADER. Zero out the TimeDateStamp field, seek back to e_lfanew, and write the IMAGE_FILE_HEADER.
 
Totally safe. Smile | :)

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130617.1 | Last Updated 18 Dec 2001
Article Copyright 2001 by Brian Friesen
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid