Click here to Skip to main content
Click here to Skip to main content
Go to top

Duplicate Files Finder

, 15 Dec 2008
Rate this:
Please Sign up or sign in to vote.
A utility to find any duplicate file in your hard drives using MD5 hashing.

DuplicateFinder.old.jpg

Search results

DuplicateFinder_deleted.JPG

File deleted

Introduction

Once a year, I do that terrific job of cleaning files I created or downloaded on my drives. The last time I tried to do it, it was such a fastidious task that I thought of doing that thing semi-automatically. I needed some free utility that could find duplicate files, but I found none that corresponded to my needs. I decided to write one.

Background

The CRC calculation method is available here. I use the MD5 hashing provided by the standard libraries. I added an event to the MD5 computing method so as to get a hashing progression, it is a thread that reads the stream position while the MD5 computing method is reading the same stream.

Using the code

The utility uses two main classes, DirectoryCrawler and Comparers. The use is obvious Smile | :) Please notice that instead of iterating through a list list.count X list.count times, DuplicateFinder uses a Hashtable that contains the pair <size,count>. Once populated, all files with count =1 will be removed: (Very much faster!!!!)

int len = filesToCompare.Length;
List<long> alIdx = new List<long>();
System.Collections.Hashtable HLengths = new System.Collections.Hashtable();
foreach (FileInfo fileInfo in filesToCompare)
{
    if (!HLengths.Contains(fileInfo.Length))
        HLengths.Add(fileInfo.Length, 1);
    else
        HLengths[fileInfo.Length] = (int)HLengths[fileInfo.Length] + 1;
}
foreach (DictionaryEntry hash in HLengths)
    if ((int)hash.Value == 1)
    {                    
        alIdx.Add((long)hash.Key);
        setText(stsMain, string.Format("Will remove File with size {0}", hash.Key));
    }
FileInfo[] fiZ = new FileInfo[len - alIdx.Count];
int j = 0;
for (int i = 0; i < len; i++)
{
    if (!alIdx.Contains(filesToCompare[i].Length))
        fiZ[j++] = filesToCompare[i];
}
return fiZ;

Points of interest

  • (Done) Optimizes file moving, UI may be unresponsive while moving big files Frown | :(
  • (Useless, my MD5 is better ^_^) Add options to choose between CRC32 and MD5 hashing.
  • Maybe use an XML configuration file. At this time, moving duplicate files to D:\DuplicateFiles (which is hard coded, viva Microsoft!) and skipping that folder during scanning is sufficient to me.
  • Don't forget that your posts make POIs.
  • (Done): Code an event enabled MD5 hashing class that would report hashing progression, imagine hashing a 10 GB file!

History

  • v0.2
    • Optimized duplicates retrieving (duplicate sizes and duplicate hashes).
    • Added Move to Recycle Bin.
    • Added file size criteria.
    • Files to delete info updated for every check/uncheck in listview.
    • Added colors and fonts to UI.
    • Debug enabled sources (#if DEBUG synchronous #else threaded).
    • Added List<Fileinfo> and List<string[]> instead of using array lists.
    • MD5 hashing is used instead of CRC32 (supercat9).
    • Added Skip Source Folder option.
    • Added Drop SubFolder.
    • Some optimizations...
  • v0.1
    • First time publishing. Waiting for bug reports Smile | :)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

eRRaTuM
Chief Technology Officer
Morocco Morocco
in his studies, erratum discovered c/c++.he appreciated it.
when he met oracle products, in his job, he fell in love.
he uses c# .net & ms sql.
 
he created a "f.r.i.e.n.d.s" like soap movie, melting all of the above.
went back in the university.
after he took courses of artificial vision & imagery, he finished his studies with a successful license plate recognition project.

Comments and Discussions

 
GeneralMy vote of 5 PinmemberMidax31-Oct-13 3:39 
GeneralRe: My vote of 5 PinmembereRRaTuM28-May-14 3:43 
Generaldup0licate folder Pinmembertirmizi1010-Dec-12 3:33 
GeneralRe: dup0licate folder PinmembereRRaTuM28-May-14 3:42 
Questionsaymahayen PinmemberMember 960575916-Nov-12 9:51 
GeneralMy vote of 5 Pinmembermanoj kumar choubey6-Jul-12 19:55 
GeneralRe: My vote of 5 PinmembereRRaTuM28-May-14 3:40 
QuestionReference to new version Pinmemberabusa4-Feb-12 21:57 
Questionsource code problem Pinmemberarun116818-Jan-12 14:25 
QuestionDuplicate Finder Pinmembervernados24-Oct-11 6:07 
QuestionCrashes when using large number of files? Pinmembermarkus folius10-Jan-11 12:29 
Hi,
 
This tool is great - more practical than any of the commercial ones I've tried! I do find though that it crashes when you add too many files. I added two drives to it, probably 1.5TB total, with *.* files but a filter so that only files 20MB+ would be scanned. Each time I've left it be while it's been obtaining the MD5's of the files, and when I've returned there has been a 'hard' crash (black screen, hard reset required).
 
I would have thought with managed code like C# there would be an exception dialog and not a hard crash, so perhaps another issue is contributing. The computer has not crashed in this way previously though. [Note: Trying a similar thing on another computer had the app stop responding, but it didn't take down the PC]
 
Has anyone else experienced this? If so I'll do some more testing of exactly when the error occurs. Is there a log file I could post?
 
Thanks!
AnswerRe: Crashes when using large number of files? PinmembereRRaTuM28-May-14 3:46 
QuestionO(N^2) dependency in the listview? PinmemberSInsanity5-Dec-10 5:14 
AnswerRe: O(N^2) dependency in the listview? PinmembereRRaTuM28-May-14 3:25 
GeneralContextSwitchDeadlock was detected PinmemberToothRobber22-Apr-10 4:45 
GeneralMissing ConsoleTestRestorer & Restorer PinmemberMember 28715804-Apr-10 4:20 
GeneralRe: Missing ConsoleTestRestorer & Restorer Pinmemberk2ox27-Nov-10 7:06 
GeneralDuplicate File Finder , Name Finder , Zero Lenght etc Pinmemberstixoffire22-Dec-08 21:25 
GeneralRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmembereRRaTuM23-Dec-08 2:37 
GeneralRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmemberBooya10014-Jan-09 12:35 
AnswerRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmembereRRaTuM14-Jan-09 14:42 
GeneralMy vote of 5. PinmemberLion_King110922-Dec-08 17:03 
GeneralRe: My vote of 5. PinmembereRRaTuM23-Dec-08 2:51 
GeneralDuplicate Files Finder. MD5 Encryption PinmemberHenry Minute15-Dec-08 5:20 
GeneralRe: Duplicate Files Finder. MD5 Encryption PinmembereRRaTuM23-Dec-08 2:49 
GeneralHardcoded path causes RTE PinmemberAt Nel14-Dec-08 17:43 
GeneralRe: Hardcoded path causes RTE PinmembereRRaTuM15-Dec-08 3:15 
Questionthread exception ! any solution?? PinmemberMember 551477514-Dec-08 10:40 
AnswerRe: thread exception ! any solution?? PinmembereRRaTuM15-Dec-08 3:31 
GeneralI like it Pinmember=Xc@libur=12-Nov-08 17:12 
GeneralRe: I like it PinmembereRRaTuM17-Nov-08 1:52 
GeneralDuplicate FIle Name finder Pinmemberpyrodood9-Sep-08 5:17 
AnswerRe: Duplicate FIle Name finder PinmembereRRaTuM15-Sep-08 16:25 
GeneralRe: Duplicate FIle Name finder [modified] Pinmemberpyrodood16-Sep-08 4:43 
GeneralRe: Duplicate FIle Name finder Pinmembersupercat915-Dec-08 6:29 
GeneralRe: Duplicate FIle Name finder Pinmemberpyrodood15-Dec-08 7:18 
GeneralGreat article Pinmemberfinal_zero1-Sep-08 9:09 
AnswerRe: Great article PinmembereRRaTuM15-Sep-08 16:34 
GeneralRe: Great article Pinmemberfinal_zero19-Sep-08 9:01 
GeneralRe: Great article PinmembereRRaTuM22-Sep-08 4:55 
QuestionCaching? PinmemberGregSawin1-Sep-08 7:35 
AnswerRe: Caching? PinmembereRRaTuM15-Sep-08 16:42 
GeneralRe: Caching? PinmemberPedro Barreto2-Oct-08 7:10 
AnswerCompute MD5 for everything? Pinmembersupercat915-Dec-08 12:20 
GeneralRe: Compute MD5 for everything? PinmembereRRaTuM23-Dec-08 3:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140926.1 | Last Updated 15 Dec 2008
Article Copyright 2008 by eRRaTuM
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid