Click here to Skip to main content
15,798,592 members
Articles / Programming Languages / C#
Tip/Trick

Find files that have the same contents

Rate me:
Please Sign up or sign in to vote.
2.67/5 (2 votes)
1 Sep 2012CPOL 14.6K   5   8
Find the files that have the same contents in under special folder

Introduction 

How to find the files that have the same contents.

Background 

I found  there are many files that have the same contents but have different file name in my computer, i want to clear them and just maint one copy.  Therefore i wrote this small tool to find the same files.

Using the code 

 I  used the dictionary to store the file's mark(calculate by MD5 alg) and the corresponding files:

C#
Dictionary<string, List<string>> dtFiles = new Dictionary<string, List<string>>()

and then find each file under the folder and get it's mark. In order to use the Dictionary.ContainsKey method, i convert the byte[] to base64 string.

    string[] strFiles = Directory.GetFiles(strFolder_);
    foreach (string strFullFile in strFiles)
    {
        if (_bToStop)
            return;

        try
        {
            byte[] byMd5 = Xugd.Hash.XMd5.CalcFile(strFullFile);
            string strMd5 = XConvert.BytesToString(byMd5);
            if (dtFiles_.ContainsKey(strMd5))
            {
                List<string> lstFiles = dtFiles_[strMd5];
                lstFiles.Add(strFullFile);
                dtFiles_[strMd5] = lstFiles;
            }
            else
            {
                List<string> lstFiles = new List<string>(2);
                lstFiles.Add(strFullFile);
                dtFiles_.Add(strMd5, lstFiles);
            }
        }
        catch { }
    }

    // Find in sub dir
    string[] strDirs = Directory.GetDirectories(strFolder_);
    foreach (string strSub in strDirs)
        FindSameFile(strSub, delStart_, dtFiles_);
}

 after process all files, we can check each item in the dictionary and find out the files that has the same contents.

foreach (KeyValuePair<string, List<string>> kvFile in dtFiles)
{
    if (kvFile.Value.Count > 1)
    {
        // found the files
    }
}

 Points of Interest  

This small article is written for those developers who are want to found the same contents file.

History

31 August 2012: First version  

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
China China
This member doesn't quite have enough reputation to be able to display their biography and homepage.

Comments and Discussions

 
GeneralMy vote of 2 Pin
Andreas Gieriet2-Sep-12 21:16
professionalAndreas Gieriet2-Sep-12 21:16 
QuestionWhere does that 3rd party KPSoft.FScan come from? Source available? Pin
Andreas Gieriet1-Sep-12 9:52
professionalAndreas Gieriet1-Sep-12 9:52 
AnswerRe: Where does that 3rd party KPSoft.FScan come from? Source available? Pin
alwaysrun2-Sep-12 20:53
alwaysrun2-Sep-12 20:53 
GeneralRe: Where does that 3rd party KPSoft.FScan come from? Source available? Pin
Andreas Gieriet2-Sep-12 21:15
professionalAndreas Gieriet2-Sep-12 21:15 
GeneralThoughts Pin
PIEBALDconsult31-Aug-12 5:21
professionalPIEBALDconsult31-Aug-12 5:21 
I'd want to store the length of the file as well to further protect against the shortcomings of MD5 (and more likely I'd use a better hash algorithm).

Also,

List<string> lstFiles = dtFiles_[strMd5];
lstFiles.Add(strFullFile);
dtFiles_[strMd5] = lstFiles;


can be reduced to

dtFiles_[strMd5].Add(strFullFile);


and that whole if/else can be reduced to

if ( <big>!</big> dtFiles_.ContainsKey(strMd5)) // Note the <big>!</big>
{
  dtFiles [ strMd5 ] = new new List<string>(2);
}

dtFiles_[strMd5].Add(strFullFile);



Edit: Hmmm... CP isn't handling <b><big>!</big></b> properly, off to B&S.
GeneralRe: Thoughts Pin
alwaysrun31-Aug-12 23:10
alwaysrun31-Aug-12 23:10 
GeneralRe: Thoughts Pin
PIEBALDconsult1-Sep-12 6:59
professionalPIEBALDconsult1-Sep-12 6:59 
GeneralRe: Thoughts Pin
alwaysrun2-Sep-12 20:57
alwaysrun2-Sep-12 20:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.