Find files that have the same contents
Find the files that have the same contents in under special folder
Introduction
How to find the files that have the same contents.
Background
I found there are many files that have the same contents but have different file name in my computer, i want to clear them and just maint one copy. Therefore i wrote this small tool to find the same files.
Using the code
I used the dictionary to store the file's mark(calculate by MD5 alg) and the corresponding files:
Dictionary<string, List<string>> dtFiles = new Dictionary<string, List<string>>()
and then find each file under the folder and get it's mark. In order to use the Dictionary.ContainsKey method, i convert the byte[] to base64 string.
string[] strFiles = Directory.GetFiles(strFolder_); foreach (string strFullFile in strFiles) { if (_bToStop) return; try { byte[] byMd5 = Xugd.Hash.XMd5.CalcFile(strFullFile); string strMd5 = XConvert.BytesToString(byMd5); if (dtFiles_.ContainsKey(strMd5)) { List<string> lstFiles = dtFiles_[strMd5]; lstFiles.Add(strFullFile); dtFiles_[strMd5] = lstFiles; } else { List<string> lstFiles = new List<string>(2); lstFiles.Add(strFullFile); dtFiles_.Add(strMd5, lstFiles); } } catch { } } // Find in sub dir string[] strDirs = Directory.GetDirectories(strFolder_); foreach (string strSub in strDirs) FindSameFile(strSub, delStart_, dtFiles_); }
after process all files, we can check each item in the dictionary and find out the files that has the same contents.
foreach (KeyValuePair<string, List<string>> kvFile in dtFiles) { if (kvFile.Value.Count > 1) { // found the files } }
Points of Interest
This small article is written for those developers who are want to found the same contents file.
History
31 August 2012: First version