Click here to Skip to main content
6,595,444 members and growing! (17,695 online)
Email Password   helpLost your password?
Desktop Development » Files and Folders » Utilities     Intermediate License: The Code Project Open License (CPOL)

Duplicate Files Finder

By eRRaTuM

A utility to find any duplicate file in your hard drives using MD5 hashing.
C# (C# 1.0, C# 2.0, C# 3.0), .NET, Dev
Posted:11 Aug 2008
Updated:15 Dec 2008
Views:31,800
Bookmarked:83 times
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
30 votes for this article.
Popularity: 6.53 Rating: 4.42 out of 5

1
2 votes, 6.7%
2
3 votes, 10.0%
3
3 votes, 10.0%
4
22 votes, 73.3%
5

DuplicateFinder.old.jpg

Search results

DuplicateFinder_deleted.JPG

File deleted

Introduction

Once a year, I do that terrific job of cleaning files I created or downloaded on my drives. The last time I tried to do it, it was such a fastidious task that I thought of doing that thing semi-automatically. I needed some free utility that could find duplicate files, but I found none that corresponded to my needs. I decided to write one.

Background

The CRC calculation method is available here. I use the MD5 hashing provided by the standard libraries. I added an event to the MD5 computing method so as to get a hashing progression, it is a thread that reads the stream position while the MD5 computing method is reading the same stream.

Using the code

The utility uses two main classes, DirectoryCrawler and Comparers. The use is obvious :) Please notice that instead of iterating through a list list.count X list.count times, DuplicateFinder uses a Hashtable that contains the pair <size,count>. Once populated, all files with count =1 will be removed: (Very much faster!!!!)

int len = filesToCompare.Length;
List<long> alIdx = new List<long>();
System.Collections.Hashtable HLengths = new System.Collections.Hashtable();
foreach (FileInfo fileInfo in filesToCompare)
{
    if (!HLengths.Contains(fileInfo.Length))
        HLengths.Add(fileInfo.Length, 1);
    else
        HLengths[fileInfo.Length] = (int)HLengths[fileInfo.Length] + 1;
}
foreach (DictionaryEntry hash in HLengths)
    if ((int)hash.Value == 1)
    {                    
        alIdx.Add((long)hash.Key);
        setText(stsMain, string.Format("Will remove File with size {0}", hash.Key));
    }
FileInfo[] fiZ = new FileInfo[len - alIdx.Count];
int j = 0;
for (int i = 0; i < len; i++)
{
    if (!alIdx.Contains(filesToCompare[i].Length))
        fiZ[j++] = filesToCompare[i];
}
return fiZ;

Points of interest

  • (Done) Optimizes file moving, UI may be unresponsive while moving big files :(.
  • (Useless, my MD5 is better ^_^) Add options to choose between CRC32 and MD5 hashing.
  • Maybe use an XML configuration file. At this time, moving duplicate files to D:\DuplicateFiles (which is hard coded, viva Microsoft!) and skipping that folder during scanning is sufficient to me.
  • Don't forget that your posts make POIs.
  • (Done): Code an event enabled MD5 hashing class that would report hashing progression, imagine hashing a 10 GB file!

History

  • v0.2
    • Optimized duplicates retrieving (duplicate sizes and duplicate hashes).
    • Added Move to Recycle Bin.
    • Added file size criteria.
    • Files to delete info updated for every check/uncheck in listview.
    • Added colors and fonts to UI.
    • Debug enabled sources (#if DEBUG synchronous #else threaded).
    • Added List<Fileinfo> and List<string[]> instead of using array lists.
    • MD5 hashing is used instead of CRC32 (supercat9).
    • Added Skip Source Folder option.
    • Added Drop SubFolder.
    • Some optimizations...
  • v0.1
    • First time publishing. Waiting for bug reports :)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

eRRaTuM


Member
In his studies, eRRaTuM discovered C/C++.he appreciated it.
When he met ORACLE products, in his job, he fell in love.
He uses C# .net & MS SQL.

He created a "F.R.I.E.N.D.S" like soap movie, melting all of the above.
Went back in the university.
After he took courses of Artificial Vision & Imagery, he finished his studies with a successful License Plate Recognition project.
Occupation: Architect
Location: Morocco Morocco

Other popular Files and Folders articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 25 of 52 (Total in Forum: 52) (Refresh)FirstPrevNext
GeneralDuplicate File Finder , Name Finder , Zero Lenght etc Pinmemberstixoffire22:25 22 Dec '08  
GeneralRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmembereRRaTuM3:37 23 Dec '08  
GeneralRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmemberBooya10013:35 14 Jan '09  
AnswerRe: Duplicate File Finder , Name Finder , Zero Lenght etc PinmembereRRaTuM15:42 14 Jan '09  
GeneralMy vote of 5. PinmemberLion_King110918:03 22 Dec '08  
GeneralRe: My vote of 5. PinmembereRRaTuM3:51 23 Dec '08  
GeneralDuplicate Files Finder. MD5 Encryption PinmemberHenry Minute6:20 15 Dec '08  
GeneralRe: Duplicate Files Finder. MD5 Encryption PinmembereRRaTuM3:49 23 Dec '08  
GeneralHardcoded path causes RTE PinmemberAt Nel18:43 14 Dec '08  
GeneralRe: Hardcoded path causes RTE PinmembereRRaTuM4:15 15 Dec '08  
Generalthread exception ! any solution?? PinmemberMember 551477511:40 14 Dec '08  
GeneralRe: thread exception ! any solution?? PinmembereRRaTuM4:31 15 Dec '08  
GeneralI like it Pinmember=Xc@libur=18:12 12 Nov '08  
GeneralRe: I like it PinmembereRRaTuM2:52 17 Nov '08  
GeneralDuplicate FIle Name finder Pinmemberpyrodood6:17 9 Sep '08  
AnswerRe: Duplicate FIle Name finder PinmembereRRaTuM17:25 15 Sep '08  
GeneralRe: Duplicate FIle Name finder [modified] Pinmemberpyrodood5:43 16 Sep '08  
GeneralRe: Duplicate FIle Name finder Pinmembersupercat97:29 15 Dec '08  
GeneralRe: Duplicate FIle Name finder Pinmemberpyrodood8:18 15 Dec '08  
GeneralGreat article Pinmemberfinal_zero10:09 1 Sep '08  
AnswerRe: Great article PinmembereRRaTuM17:34 15 Sep '08  
GeneralRe: Great article Pinmemberfinal_zero10:01 19 Sep '08  
GeneralRe: Great article PinmembereRRaTuM5:55 22 Sep '08  
GeneralCaching? PinmemberGregSawin8:35 1 Sep '08  
GeneralRe: Caching? PinmembereRRaTuM17:42 15 Sep '08  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 15 Dec 2008
Editor: Smitha Vijayan
Copyright 2008 by eRRaTuM
Everything else Copyright © CodeProject, 1999-2009
Web21 | Advertise on the Code Project