Click here to Skip to main content
15,903,388 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,
in my application, i am computing hash values of files. as of now MD5 in vb.net computes hash values by including timestamp of files as well. how can i compute hash value of files from its contents only.

appreciate your feedback.
Posted

 
Share this answer
 
Comments
deepak_2012 15-Aug-12 15:37pm    
I tried code in above link and it gives me different hash value if I change and revert back the changes in doc or txt file. so in above code md5 takes timestamp while creating hash key.
barneyman 15-Aug-12 18:19pm    
?? You've read the code, it makes no reference to the filestamp - it takes a bytestream ... I think you'll find that the 'revert' isn't as complete as you think - take a file (a.txt) make a copy (copy of a.txt) take md5s of both, modify a.txt, take md5, 'revert' a.txt, take md5, if the md5s are different use winmerge/windoff on a.txt and copy of a.txt - you'll find the contents are not the same
as of now MD5 in vb.net computes hash values by including timestamp of files as well.

Uhhh, no it doesn't. It will only do that if you include that timestamp data in the data you're feeding the hash algorithm.
 
Share this answer
 
Comments
deepak_2012 15-Aug-12 3:11am    
no. we are not modifying md5 component in vb.net. we are simply reading file and feeding it to md5. any code you have?
the problem is, when it reads text and doc file with same contents, then it generate resultset.
Dave Kreskowiak 15-Aug-12 7:58am    
Really? You've got a LOT to learn.

.DOC and .TXT files do NOT have the same content, even if the text between them is the same! .TXT files are just plain, clear text, as .DOC files (usually Word) contain lots and lots of Word specific formatting and data as well as the text. Did you even look at the file sizes between the two files?? That alone should have told you the two files are not the same.
deepak_2012 15-Aug-12 15:34pm    
yes, i already know that. but here i am looking for solution and not a debate. i need some algorithm that will give me same hash/unique value for the files with same contents but of different extensions..
Dave Kreskowiak 15-Aug-12 17:53pm    
MD5, and every other hash algorithm out there, does NOT user anything else other than the contents of the file. They do NOT user the filenames, nor any datetime stamps in the source data to hash.

If the file contents are identical, they WILL generate the same hash. The problem is not the algorithm, but your own code supplying the data to hash.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900