|
I am trying to find a C++ source code to compare folders by using CRC. If you know where, please post here. Thanks in advance!
|
|
|
|
|
Hi.
I am pretty familiar with the concepts of CRC.
I wonder, how do I use it when serializing data to file (I wan´t to be able to validate data in file afterwards when reading it back in to my application).
In the serialisation process data is "streamed" to file. First after the file is generated, it would make sence to to the actual CRC check of the file. Another issue is if this does destroy the possibility to use serialization when reading the file back (at leasy some execptions will be thrown).
So my question is:
Is there a standard method to use CRC when serializing, so data in the file can be validated afterwards (fx. before reading the file back).
(ORG. FILE) --> (ADD CRC) --> (VALIDATE FILE LATER ON)
|-----------------| |-----------------| |-----------------|
|-----------------| |-----------------| |-----------------| CRC == FILECHECK ???
|-----------------| |-----------------| |-----------------|
|-----------------| |-----------------| |-----------------|
|-CRC-|
^
|-- This CRC is based on the org. file
/Jonas
|
|
|
|
|
I check several books that mentioned about CRC principle.
The basic theory is like as below description in coding.
====================================
for(index = 0; index <= x; index++)
{
crc = crc & 0x80000000 ? (crc << 1) ^ 0x04C11DB7 : crc << 1;
}
=====================
But, In the most of CRC32's implementation.
It used a look up table.
And both results are difference.
Why??
I am curious in CRC32's implementation method and theory.
Who is the inventor?
Why the result doesn't correspond to the basic theory?
Where can I get any related paper?
Your kind reply will be highly appreciated.
|
|
|
|
|
First, let me say this article was a big help. It had just the right amount of info for a person who knows very little about CRCs and simply needs the basics. However, I have a question/problem. I am working on a project that I need to have the same CRC each time it is compiled. I compile my project under VisualStudio 6.0 on WinNT and generate a CRC on the executable I just created. Then, I rebuild the entire project (using the exact same files and compiler) and do a CRC on the new executable. The two CRCs are different, even though the exes are the same. I have come to the conclusion that there is some sort of timestamp telling when the file was created. Does anyone know of an algorithm that compensates for the timestamp and ignores it? Does that make sense? If not, ask and I can clarify.
Thanks
|
|
|
|
|
How do you know the two files are the same? Did you do a bit-wise comparision (e.g. "fc /b"). Even though the source code didn't change, everytime a project is compiled, Visual Studio does not generate the exact same binary. You'd think it would, but it doesn't. I'm not sure why (maybe someone else knows), there might be an internally compiled timestamp which would explain the difference. But my program does not use the file system times. That's most likely why you're getting different CRCs, because the files are indeed different.
That having been said, I think there is a flaw in the way you're trying to use CRCs. Are you saying if you were to make actual changes to the code you still want to get back the same CRC? To try and force one file to have to same CRC as another file is a very difficult task, nearly impossible. The whole idea behind CRCs is the slightest difference in a file results in a significant change in the CRC value.
|
|
|
|
|
I opened the files in a Hex Editor, looking for differences, and noticed that there were 3 locations that a pair of entries were different. The locations of the differences are always the same, so I'm fairly sure that there is some sort of timestamp applied to the file. However, I haven't found if there is a way to control this from Visual Studio and was hoping someone might know where I could find information about this. The reason I need the CRC is to prove to certain authorities that I can recompile exactly the same executable now and six months down the road, with the exact same files (We're under verions control management).
|
|
|
|
|
Take a look in the appendix of the PE Specification http://www.cs.ucsb.edu/~nomed/docs/pecoff.html#_Toc83091247.
--------------------
Wanderley Caloni Jr.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/IT d++(--) s+: a- C++ L E- W++ K- w++ PS
PE Y+ PGP+ t+ X+ R tv b+ DI++ D+ G e h- r y+
------END GEEK CODE BLOCK------
|
|
|
|
|
You are correct in your conclusion. The PE file format contains multiple timestamps. The resources are also timestamped. To "prove" that two exe's are identical, you will have to parse the PE file format, crc'ing only the code sections.
Have you considered crc'ing the source code instead?
HPS HwndSpy - GUI developer's aid to visually
locate and inspect windows. For the month of August
only, use coupon code CP-81239 for 30% off.
|
|
|
|
|
Does your hwndspy app work on full-screen direct3d apps?
------- signature starts
"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001
"You won't like me when I'm angry..." - Dr. Bruce Banner
Please review the Legal Disclaimer in my bio.
------- signature ends
|
|
|
|
|
Yes, it is in the IMAGE_FILE_HEADER.
Look at your WinNT.h file and file IMAGE_FILE_HEADER. Look at the TimeDateStamp field.
The timestamp is completely ignored by windows and it is safe to null it out.
To find the file header in your exe, read the IMAGE_DOS_HEADER located at the beginning of the file.
The e_lfanew tells you the file offset of the IMAGE_FILE_HEADER. Seek there and read the IMAGE_FILE_HEADER. Zero out the TimeDateStamp field, seek back to e_lfanew, and write the IMAGE_FILE_HEADER.
Totally safe.
|
|
|
|
|
"{03544472-641E-4B7B-8AEF-214C8DB9037E}",
"{FB4A21E3-48F0-46FF-98BB-410914CE9C6A}"
And many other pairs of very different GUID strings has the same CRC32 value
|
|
|
|
|
As I said in my article it's possible for spurious hits to happen. CRCs do not guarantee uniqueness, but are a pretty good indication of it.
|
|
|
|
|
The old iostream library has now been removed from VS 2003 (it was flagged as deprecated in vs 2002).
This means that the FileCrc32Streams function will not compile, however if the FileCrc32Streams function and header definition are commented out, the rest will compile fine so no problem if you don't use FileCrc32Streams.
I really like this article and found it very useful, I encourage you to take a look at re-writing FileCrc32Streams to use the new Standard C++ iostream library.
See here for the changes:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/_core_differences_in_iostream_implementation.asp[^]
|
|
|
|
|
The code I provided was never meant to be used as an SDK if you will. The article is meant to teach people about CRCs and give them code showing them how it's done. It is up to the other developers to use my code as a guide, or copy and paste the necessary sections into their projects. Thus I doubt I'll take the time to update the code to the new compiler.
|
|
|
|
|
As #include <fstream.h> is deprecated in VC7, the code compile, but if you use it in a MFC application, it couldn't link because you have a "delete" function redefinition in the .lib containaing file stream functions ...
The solution to use it in VC7 with MFC is to comment
//#include <fstream.h>
and to comment FileCrc32Streams function in both .cpp and .h files...
|
|
|
|
|
sorry ... i post the message without check the box "Display this message as-is (no HTML)" ... and thus, it is not complete ... Here is the message:
As #include <fstream.h> is deprecated in VC7, the code compile, but if you use it in a MFC application, it couldn't link because you have a "delete" function redefinition in the .lib containaing file stream functions ...
The solution to use it in VC7 with MFC is to comment
//#include <fstream.h>
and to comment FileCrc32Streams function in both .cpp and .h files...
|
|
|
|
|
Another solution of this problem (in case you don't know):
In the new Standard C++ iostream library
"open" functions do not take a third parameter (the protection parameter),
and some elements of the old iostream library are not elements
of the new iostream library, among them is "nocreate".
First of all, you must change the
<br />
#include <fstream.h><br />
to:
#include <fstream><br />
Then add the string:
<br />
using namespace std;<br />
You should also make some changes in the function "open" to
comply with the new stream library implementation of VC7,
and it will just work fine.
Here is the original function in the sample code
(files Crc32Static.cpp and Crc32Dynamic.cpp), which caused compile errors:
<br />
file.open(szFilename, ios::in | ios::nocreate | ios::binary, filebuf::sh_read);<br />
and here is the "modernized" one:
<br />
file.open(szFilename, ios::in | ios::binary);<br />
Now it should compile without errors.
|
|
|
|
|
are you planning to port this to c#?
|
|
|
|
|
Sorry, but I have no plans to convert the code to VB, Java, J#, nor C#. I've provided the code and the techniques, I'll let someone else do the conversion.
|
|
|
|
|
You might find this interesting for the assembly optimizations
Asm optimized crc32
//Ante, ante.c@runbox.com
|
|
|
|
|
Hey. Nice stuff, found it useful.
I decided to try my hand at the assembler, poked around the link listed by Ante above. My result is below. In my case, I "init" the CRC in a separate call, this routine works a series of buffers as they're received until I reach the end. My table is a global variable, not a static member of the object, but otherwise it's like yours. I noted that on the site Ante links to, they claimed 158 Mbytes per second on a T'bird 1.4 Ghz. On my 1.5Ghz AMD XP I get 275 Mbytes per second, not sure if the T'bird should be that far behind or not. For me, the punch came from loading a quad word at a time, not a byte at a time.
__asm
{
mov eax, this // Load 'this'
mov ecx, [eax]this.CurrentCRC // Load running CRC from 'this'
mov edi, offset Crc32Table // Load the CRC32 table
mov esi, buf // Load buffer
xor ebx, ebx // zero ebx, used to process bytes, forming
// index into crc table
mov eax, len // get length
mov edx, eax
and eax, 3 // calc remainder after division by 4
push eax // preserve the remainder for later
shr edx, 2 // div by 4, calculating total quadwords
jz crc32tail // if zero, prepare for a tiny bit of work
crc32loop:
mov eax, [esi] // grab a quadword from buf
mov bl, al // form index entry, starting with a byte from buf
// part one of 4 in the quadword
xor bl, cl // xor against current CRC
shr ecx, 8 // shift CRC
xor ecx, [edi + ebx * 4] // xor CRC with the table's entry
// part two of 4 in the quadword
mov bl, ah // grab another byte of buf
xor bl, cl // xor against current CRC
shr ecx, 8 // shift CRC
xor ecx, [edi + ebx * 4] // xor CRC with table's entry
shr eax, 16 // shift the buf data two bytes down
// part three of 4 in the quadword
mov bl, al // grab another byte of buf
xor bl, cl // xor against current CRC
shr ecx, 8 // shift CRC
xor ecx, [edi + ebx * 4] // xor CRC with table's entry
// part four of 4 in the quadword
mov bl, ah // grab another byte of buf
xor bl, cl // xor against current CRC
shr ecx, 8 // shift CRC
xor ecx, [edi + ebx * 4] // xor CRC with table's entry
add esi, 4 // Advance the source pointer one quadword
dec edx // counting quadwords
jnz crc32loop // if more quadwords, loop
crc32tail:
pop edx // retreive the remainder of quadwords
cmp edx, 0 // check to see if it's zero
je crc32end
crc32tinyloop:
mov bl, byte ptr [esi] // grab one byte from buf
xor bl, cl // xor against current crc
shr ecx, 8 // shift crc
xor ecx, [edi + ebx * 4] // xor crc with table's entry
inc esi // increment buf pointer
dec edx // dec count
jnz crc32tinyloop // loop if not zero
crc32end:
mov eax, this
mov [eax]this.CurrentCRC, ecx // write to CurrentCRC
|
|
|
|
|
Hahaha
I've been reading way to much on the Opteron!
Sorry guys.
My post kept refering to quadwords in that should have read dwords.
Same theory, just half the size.
Now, on the Opteron - once I get one - this should really zing right through it.
|
|
|
|
|
#pragma once
#include <iostream>
#include <tchar.h>
#include <windows.h>
----------------------------------------------
<pre>
#include "stdafx.h"
#define POLY 0xEDB88320
#define SEED 0xFFFFFFFF
DWORD CRCTable[256];
int _tmain(int argc, _TCHAR* argv[])
{ DWORD crc32;
__asm ; CRCTable filling.
{ xor ebx, ebx ; bl - CRCTable entry index. Init with 0.
mov ecx, POLY ; Load polynom into ecx for speed
CRCTableLoop: ; Head of CRC table calculation main loop.
mov eax, ebx ; Load index into eax.
xor edx, edx ; edx = 0
shr eax, 1 ; Carry Falg = LSB of eax.
cmovc edx, ecx ; edx = (eax & 1) ? POLY : 0
xor eax, edx ; eax = (eax & 1) ? eax ^ POLY : eax
; The same with other bits
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
xor edx, edx
shr eax, 1
cmovc edx, ecx
xor eax, edx
mov CRCTable[4*ebx], eax ; Fill the current table entry.
inc bl ; Move to the next table entry.
jnz CRCTableLoop ; If index < 256 Then continue CRCTable values
; Else Table is full.
}
printf("CRC32\t\tFileSize\tFileName\n");
for(int i = 1; i < argc; ++i)
{ HANDLE hFile = CreateFile(argv[i], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
if(hFile == INVALID_HANDLE_VALUE)
{ printf("Error opening\t%s\n", argv[i]);
continue;
}
HANDLE hMap = CreateFileMapping(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
if(hMap == NULL)
{ CloseHandle(hFile);
printf("Error creating map\t%s\n", argv[i]);
continue;
}
LPVOID pBuffer = MapViewOfFile(hMap, FILE_MAP_READ, 0, 0, 0);
if(pBuffer == NULL)
{ CloseHandle(hMap);
CloseHandle(hFile);
printf("Error MapViewOfFile\t%s\n", argv[i]);
continue;
}
DWORD dwSize = GetFileSize(hFile, NULL);
__asm
{ mov esi, pBuffer ; esi = buffer pointer.
mov ecx, dwSize ; ecs = buffer size.
mov eax, SEED ; Init CRC.
CalcCRC: ; Head of buffer CRC calculation main loop.
movzx ebx, byte ptr [esi] ; bl = next char, other ebx bits = 0
xor bl, al ; /Calculate
shr eax, 8 ; | current
mov ebx, CRCTable[4*ebx] ; \CRC value.
xor eax, ebx ; eax = current CRC value.
inc esi ; Move to the next char.
dec ecx ; Decrement of remaining bytes counter.
jnz CalcCRC ; IF counter > 0 THEN continue buffer CRC calculation.
not eax ; eax ^= 0xFFFFFFFF
mov crc32, eax ; save CRC
}
DWORD t3 = GetTickCount();
printf("0x%-08X\t%-8d\t%s\n", crc32, dwSize, argv[i]);
UnmapViewOfFile(pBuffer);
CloseHandle(hMap);
CloseHandle(hFile);
}
return 0;
}
|
|
|
|
|
Here's a loop I came up with:
#pragma warning(push)
#pragma warning(disable : 4035)
unsigned long Crc32_Asm(char *pBuf, unsigned nBytes,
unsigned *pCrcTbl, unsigned nCrc)
{
__asm {
mov edi,pBuf
mov ecx,nBytes
mov eax,nCrc
mov ebx,pCrcTbl
add edi,ecx
neg ecx
mov edx,eax
again:
and edx,0x000000ff
shr eax,8
movzx esi,byte ptr [edi+ecx]
xor edx,esi
mov esi,[ebx+edx*4]
xor eax,esi
inc ecx
mov edx,eax
jnz again
}
}
#pragma warning(pop)
|
|
|
|
|
Hey there,
I'm using your awesome crc generator. I have a question though: is it possible that a crc would be returns as 0, and how probable is this? I am assuming that I don't have a crc if it is equal to 0, so I'm wondering how safe is this of an assumption.
Cheers,
swinefeaster
Check out Aephid Photokeeper, the powerful digital
photo album solution at www.aephid.com.
|
|
|
|
|