Click here to Skip to main content
13,299,495 members (71,604 online)
Click here to Skip to main content
Add your own
alternative version


133 bookmarked
Posted 11 Aug 2008

Duplicate Files Finder

, 15 Dec 2008
Rate this:
Please Sign up or sign in to vote.
A utility to find any duplicate file in your hard drives using MD5 hashing.


Search results


File deleted


Once a year, I do that terrific job of cleaning files I created or downloaded on my drives. The last time I tried to do it, it was such a fastidious task that I thought of doing that thing semi-automatically. I needed some free utility that could find duplicate files, but I found none that corresponded to my needs. I decided to write one.


The CRC calculation method is available here. I use the MD5 hashing provided by the standard libraries. I added an event to the MD5 computing method so as to get a hashing progression, it is a thread that reads the stream position while the MD5 computing method is reading the same stream.

Using the code

The utility uses two main classes, DirectoryCrawler and Comparers. The use is obvious :) Please notice that instead of iterating through a list list.count X list.count times, DuplicateFinder uses a Hashtable that contains the pair <size,count>. Once populated, all files with count =1 will be removed: (Very much faster!!!!)

int len = filesToCompare.Length;
List<long> alIdx = new List<long>();
System.Collections.Hashtable HLengths = new System.Collections.Hashtable();
foreach (FileInfo fileInfo in filesToCompare)
    if (!HLengths.Contains(fileInfo.Length))
        HLengths.Add(fileInfo.Length, 1);
        HLengths[fileInfo.Length] = (int)HLengths[fileInfo.Length] + 1;
foreach (DictionaryEntry hash in HLengths)
    if ((int)hash.Value == 1)
        setText(stsMain, string.Format("Will remove File with size {0}", hash.Key));
FileInfo[] fiZ = new FileInfo[len - alIdx.Count];
int j = 0;
for (int i = 0; i < len; i++)
    if (!alIdx.Contains(filesToCompare[i].Length))
        fiZ[j++] = filesToCompare[i];
return fiZ;

Points of interest

  • (Done) Optimizes file moving, UI may be unresponsive while moving big files :(.
  • (Useless, my MD5 is better ^_^) Add options to choose between CRC32 and MD5 hashing.
  • Maybe use an XML configuration file. At this time, moving duplicate files to D:\DuplicateFiles (which is hard coded, viva Microsoft!) and skipping that folder during scanning is sufficient to me.
  • Don't forget that your posts make POIs.
  • (Done): Code an event enabled MD5 hashing class that would report hashing progression, imagine hashing a 10 GB file!


  • v0.2
    • Optimized duplicates retrieving (duplicate sizes and duplicate hashes).
    • Added Move to Recycle Bin.
    • Added file size criteria.
    • Files to delete info updated for every check/uncheck in listview.
    • Added colors and fonts to UI.
    • Debug enabled sources (#if DEBUG synchronous #else threaded).
    • Added List<Fileinfo> and List<string[]> instead of using array lists.
    • MD5 hashing is used instead of CRC32 (supercat9).
    • Added Skip Source Folder option.
    • Added Drop SubFolder.
    • Some optimizations...
  • v0.1
    • First time publishing. Waiting for bug reports :)


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Chief Technology Officer
Morocco Morocco
in his studies, erratum discovered c/c++.he appreciated it.
when he met oracle products, in his job, he fell in love.
he uses c# .net & ms sql.

he created a "f.r.i.e.n.d.s" like soap movie, melting all of the above.
went back in the university.
after he took courses of artificial vision & imagery, he finished his studies with a successful license plate recognition project.

You may also be interested in...


Comments and Discussions

GeneralRe: Bug found (easy fix) Pin
Rick Hansen15-Aug-08 13:31
memberRick Hansen15-Aug-08 13:31 
AnswerRe: Bug found (easy fix) Pin
eRRaTuM18-Aug-08 4:00
membereRRaTuM18-Aug-08 4:00 
GeneralNice job Pin
Rick Hansen15-Aug-08 5:50
memberRick Hansen15-Aug-08 5:50 
QuestionHardlinks Pin
quiensabe12-Aug-08 7:01
memberquiensabe12-Aug-08 7:01 
AnswerRe: Hardlinks Pin
eRRaTuM14-Aug-08 9:33
membereRRaTuM14-Aug-08 9:33 
AnswerRe: Hardlinks Pin
eRRaTuM18-Aug-08 4:00
membereRRaTuM18-Aug-08 4:00 
GeneralNice, but... Pin
mrcozz12-Aug-08 1:33
membermrcozz12-Aug-08 1:33 
GeneralRe: Nice, but... Pin
leppie12-Aug-08 4:48
memberleppie12-Aug-08 4:48 
mrcozz wrote:
this will avoid boxing and unboxing (faster...)

Reference types are not boxed...

xacc.ide - now with TabsToSpaces support
IronScheme - 1.0 alpha 4a out now (29 May 2008)

GeneralRe: Nice, but... Pin
eRRaTuM15-Aug-08 13:24
membereRRaTuM15-Aug-08 13:24 
GeneralRe: Nice, but... Pin
eRRaTuM15-Aug-08 13:26
membereRRaTuM15-Aug-08 13:26 
NewsRe: Nice, but... Pin
eRRaTuM18-Aug-08 4:01
membereRRaTuM18-Aug-08 4:01 
GeneralCross Thread Exception Pin
AhsanS11-Aug-08 20:47
memberAhsanS11-Aug-08 20:47 
QuestionRe: Cross Thread Exception Pin
eRRaTuM14-Aug-08 9:50
membereRRaTuM14-Aug-08 9:50 
AnswerRe: Cross Thread Exception Pin
cjones28530-Aug-08 16:07
membercjones28530-Aug-08 16:07 
GeneralRe: Cross Thread Exception Pin
eRRaTuM15-Sep-08 18:44
membereRRaTuM15-Sep-08 18:44 
NewsRe: Cross Thread Exception Pin
eRRaTuM18-Aug-08 4:02
membereRRaTuM18-Aug-08 4:02 
GeneralMay be useful to add arithmetic checksum (orthogonal to CRC) Pin
supercat911-Aug-08 13:24
membersupercat911-Aug-08 13:24 
GeneralRe: May be useful to add arithmetic checksum (orthogonal to CRC) Pin
eRRaTuM14-Aug-08 9:53
membereRRaTuM14-Aug-08 9:53 
GeneralRe: May be useful to add arithmetic checksum (orthogonal to CRC) Pin
supercat914-Aug-08 13:08
membersupercat914-Aug-08 13:08 
QuestionRe: May be useful to add arithmetic checksum (orthogonal to CRC) Pin
eRRaTuM15-Aug-08 5:18
membereRRaTuM15-Aug-08 5:18 
AnswerRe: May be useful to add arithmetic checksum (orthogonal to CRC) Pin
eRRaTuM18-Aug-08 4:04
membereRRaTuM18-Aug-08 4:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171207.1 | Last Updated 15 Dec 2008
Article Copyright 2008 by eRRaTuM
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid