![]() |
General Programming »
Algorithms & Recipes »
Algorithms
Intermediate
CRC32: Generating a checksum for a fileBy Brian FriesenHow to generate a CRC32 based on a file |
VC6, VC7Win2K, WinXP, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||

C++ Streams: The first function represents the simplest CRC function. The file is opened using the C++ stream classes (ifstream). This function uses nothing but standard C++ calls, so this function should compile and run using any C++ compiler on any OS.
Win32 I/O: This function is more optimized in that it uses the Win32 API for file I/O; CreateFile, and ReadFile. This will speed up the processing, but by using the Win32 API the code is no longer platform independent.
Filemaps: This function uses memory mapped files to process the file. Filemaps can be used to greatly increase the speed with which files are accessed. They allow the contents of a file to be accessed as if it were in memory. No longer does the programmer need to call ReadFile and WriteFile.
Assembly: The final CRC function is one that is optimized using Intel Assembly. By hand writing the assembly code the algorithm can be optimized for speed, although at the sacrifice of being easy to read and understand.
Those are the four main CRC functions. But there are actually two versions of each function. There are two classes, CCrc32Dynamic and CCrc32Static, each of which have the above four functions for a total of eight. The only difference between the static and dynamic classes is the CRC table. With the static class the CRC table and all the functions in the class are static. The trade off is simple. The static class is simpler to use, but the dynamic class uses memory more efficiently because the CRC table (1K in size) is only allocated when needed.
// Using the static class is as easy as one line of code dwErrorCode = CCrc32Static::FileCrc32Assembly(m_strFilename, dwCrc32); // Whereas there is more involved when using the dynamic class CCrc32Dynamic *pobCrc32Dynamic = new CCrc32Dynamic; pobCrc32Dynamic->Init(); dwErrorCode = pobCrc32Dynamic->FileCrc32Assembly(m_strFilename, dwCrc32); pobCrc32Dynamic->Free(); delete pobCrc32Dynamic;
Whenever you calculate a CRC you need to take into account the speed of the algorithm. Generating CRCs for files is both a CPU and a disk intensive task. Here is a table showing the time it took to CRC three different files. The columns are the different file sizes, the rows are the different CRC functions, and the table entries are in seconds. The system these numbers were captured on is a dual Pentium III at 1 GHz with a 10,000 RPM SCSI Ultra160 hard drive.
| 44 Kb | 34 Mb | 5 Gb | |
|---|---|---|---|
| C++ Streams | 0.0013 | 0.80 | 125 |
| Win32 I/O | 0.0009 | 0.60 | 85 |
| Filemaps | 0.0010 | 0.60 | 87 |
| Assembly | 0.0006 | 0.35 | 49 |
As expected the C++ streams is the slowest function followed by the Win32 I/O. However, I was very surprised to see the filemaps were not faster than the Win32 I/O, in fact they are slower. After I thought about it some, I realized memory mapped files are designed to provide fast random access to files. But when you CRC you access the file sequentially. Thus filemaps are not faster, and the extra overhead of creating the "views" of the file are why it's slower. Filemaps do have one advantage that none of the other functions have. Memory mapped files are guaranteed to be able to access files up to the maximum file size in NT which is 264 or 18 exabytes. Although the Win32 I/O may handle files of this size, none of the documentation confirms this. [Note: The largest file I have CRC'ed is 40 GB, which all eight functions successfully CRC'ed, but took over 10 minutes each.]
If anyone who reads this article knows a way to improve the speed even more, please post the code or email me. Especially if you know of a speed improvement for the assembly code. I will bet there are further optimizations that can be made to the assembly code. After all I don't know Intel Assembly very well, therefore I'm sure there are optimizations I don't know about.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 17 Dec 2001 Editor: Chris Maunder |
Copyright 2001 by Brian Friesen Everything else Copyright © CodeProject, 1999-2009 Web15 | Advertise on the Code Project |