65.9K
CodeProject is changing. Read more.
Home

Find files that have the same contents

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.67/5 (2 votes)

Aug 31, 2012

CPOL
viewsIcon

15284

Find the files that have the same contents in under special folder

Introduction 

How to find the files that have the same contents.

Background 

I found  there are many files that have the same contents but have different file name in my computer, i want to clear them and just maint one copy.  Therefore i wrote this small tool to find the same files.

Using the code 

 I  used the dictionary to store the file's mark(calculate by MD5 alg) and the corresponding files:

 Dictionary<string, List<string>> dtFiles = new Dictionary<string, List<string>>() 

and then find each file under the folder and get it's mark. In order to use the Dictionary.ContainsKey method, i convert the byte[] to base64 string.

                string[] strFiles = Directory.GetFiles(strFolder_);
                foreach (string strFullFile in strFiles)
                {
                    if (_bToStop)
                        return;

                    try
                    {
                        byte[] byMd5 = Xugd.Hash.XMd5.CalcFile(strFullFile);
                        string strMd5 = XConvert.BytesToString(byMd5);
                        if (dtFiles_.ContainsKey(strMd5))
                        {
                            List<string> lstFiles = dtFiles_[strMd5];
                            lstFiles.Add(strFullFile);
                            dtFiles_[strMd5] = lstFiles;
                        }
                        else
                        {
                            List<string> lstFiles = new List<string>(2);
                            lstFiles.Add(strFullFile);
                            dtFiles_.Add(strMd5, lstFiles);
                        }
                    }
                    catch { }
                }

                // Find in sub dir
                string[] strDirs = Directory.GetDirectories(strFolder_);
                foreach (string strSub in strDirs)
                    FindSameFile(strSub, delStart_, dtFiles_);
            }

 after process all files, we can check each item in the dictionary and find out the files that has the same contents.

                foreach (KeyValuePair<string, List<string>> kvFile in dtFiles)
                {
                    if (kvFile.Value.Count > 1)
                    {
                        // found the files
                    }
                } 

 Points of Interest  

This small article is written for those developers who are want to found the same contents file.

History

31 August 2012: First version