Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: WinXPC#
Hi everybody! My program takes argument. This argument is a directory, and program will find all files with the same content, group that files and print on screen. With small amount of files, program works good and fast, but, for example, i take my flash-card for searching and compearing(or more dramatic - the C:\ directory), program gradually increase RAM using, until to 500Mb, and write this exception. I post my code, and want to listen the critic and help.
 
Here the comparing function:
        static void CompareFilesRec(List<string> array)
        {
            List<string> Trash = new List<string>();
            List<string> Output = new List<string>();
            List<KeyValuePair<long, string>> yeah = new List<KeyValuePair<long, string>>();
            List<string> outp = new List<string>();
            //filling the list with sizes of files(as a key) and paths of files
            for (int j = 0; j <= array.Count - 1; j++)
            {
                FileInfo fii = new FileInfo(array[j]);
                yeah.Add(new KeyValuePair<long, string>(fii.Length, array[j]));
            }
            //filtering the previous list. Move files with another sizes to trash(for processing in the end)
            foreach (var el in yeah)
            {
                if (!Output.Contains(el.Value) && !Trash.Contains(el.Value))
                {
                    foreach (var ele in yeah)
                    {
                        if (el.Key == ele.Key && !Output.Contains(ele.Value))
                        {
                            Output.Add(ele.Value);
                        }
                        else
                        {
                            if (!Trash.Contains(ele.Value))
                            {
                                Trash.Add(ele.Value);
                            }
                        }
                    }
                }
            }
            //Comparing the files(i take the first element in list, and compare with other files)
            foreach (string f in Output)
            {
                int valueOne = 0, valueTwo = 0;
                StreamReader objReader = new StreamReader(f);
                valueOne = objReader.ReadToEnd().GetHashCode();
 
                StreamReader objReader1 = new StreamReader(Output[0]);
                valueTwo = objReader1.ReadToEnd().GetHashCode();
 
                if (f != Output[0] && valueOne == valueTwo)
                {
                    outp.Add(f);
                }
                if (valueOne != valueTwo)
                {
                    Trash.Add(f);
                }
 
            }
            outp.Add(Output[0]);
            //output of the list
            if (outp.Count > 1)
            {
                foreach (string fi in outp)
                {
                    Console.WriteLine(fi);
                }
                outp.Clear();
                Console.WriteLine();
            }
            //recursive
            if (Trash.Count >= 1)
                CompareFilesRec(Trash);
            Output.Clear();
            Trash.Clear();
            outp.Clear();
        }
Code:
using System;
using System.Collections.Generic;
using System.IO;
 
namespace ConsoleApplication3
{
    class Comparer
    {
        static void Main(string[] args)
        {
            if (args.Length != 0)//перевірка чи є хоча б 1 аргумент
            {
                if (args.Length == 1)//перевірка чи є тільки 1 аргумент
                {
                    List<string> ListOfFiles = new List<string>();//список для файлів

                    ListOfFiles.AddRange(LookIn(args[0]));//отримуєм список файлів
                    Console.WriteLine("Found {0} files", ListOfFiles.Count);//вивести скільки файлів знайдено

                    if(ListOfFiles.Count > 0)
                       CompareFilesRec(ListOfFiles);
                }
                else Console.WriteLine("So many arguments... o_O");//повідомлення, якщо аргументів забагато
            }
            else Console.WriteLine("Write some stuff here!");//повідомлення якщо немає аргументів
        }
 
        static bool CheckFileLessThan2Gb(string file)//функція перевірки чи файл менший за 2Гб.
        {
            FileInfo someFileInfo = new FileInfo(file);//берем інфу про файл в змінну someFileInfo
            if (someFileInfo.Length <= 2147483648)//перевірка
                return true;//якщо підходить
            else return false;//як не підходить 
        }
 
        static List<string> LookIn(string path)
        {
            /*Ініціалізую і об*являю два списки:для файлів і папок*/
            List<string> files = new List<string>();
            List<string> dirs = new List<string>();
            
            /*Шукаєм всі доступні файли*/
            try
            {
                /*Добавляю знайдені папки і файли в список*/
                files.AddRange(Directory.GetFiles(path));
                dirs.AddRange(Directory.GetDirectories(path));
                
                for (int i = 0; i <= files.Count; i++)
                {
                    if (!CheckFileLessThan2Gb(files[i]))
                        files.RemoveAt(i);
                    try
                    {
                        string FileValue = null;
                        StreamReader Value = new StreamReader(files[i]);
                        FileValue = Value.ReadToEnd();
                        if (FileValue.Length == 0)
                            files.RemoveAt(i);
                        Value.Dispose();
                        Value.Close();
                    }
                    catch (System.IO.IOException) { }
                }
            }
            catch (UnauthorizedAccessException) { }
            catch (DirectoryNotFoundException) { }
            catch (ArgumentOutOfRangeException) { }
 
            
            /*"Заглядаєм" за файлами в кожну директорію...*/
            foreach (string dir in dirs)
            {
                files.AddRange(LookIn(dir));//...і додаєм до списку
            }
            return files;//повертаєм повний список знайдених файлів
        }
 
        static void CompareFilesRec(List<string> array)
        {
            List<string> Trash = new List<string>();
            List<string> Output = new List<string>();
            List<KeyValuePair<long, string>> yeah = new List<KeyValuePair<long, string>>();
            List<string> outp = new List<string>();
 
            for (int j = 0; j <= array.Count - 1; j++)
            {
                FileInfo fii = new FileInfo(array[j]);
                yeah.Add(new KeyValuePair<long, string>(fii.Length, array[j]));
            }
 
            foreach (var el in yeah)
            {
                if (!Output.Contains(el.Value) && !Trash.Contains(el.Value))
                {
                    foreach (var ele in yeah)
                    {
                        if (el.Key == ele.Key && !Output.Contains(ele.Value))
                        {
                            Output.Add(ele.Value);
                        }
                        else
                        {
                            if (!Trash.Contains(ele.Value))
                            {
                                Trash.Add(ele.Value);
                            }
                        }
                    }
                }
            }
 
            foreach (string f in Output)
            {
                int valueOne = 0, valueTwo = 0;
                StreamReader objReader = new StreamReader(f);
                valueOne = objReader.ReadToEnd().GetHashCode();
 
                StreamReader objReader1 = new StreamReader(Output[0]);
                valueTwo = objReader1.ReadToEnd().GetHashCode();
 
                if (f != Output[0] && valueOne == valueTwo)
                {
                    outp.Add(f);
                }
                if (valueOne != valueTwo)
                {
                    Trash.Add(f);
                }
 
            }
            outp.Add(Output[0]);
            if (outp.Count > 1)
            {
                foreach (string fi in outp)
                {
                    Console.WriteLine(fi);
                }
                outp.Clear();
                Console.WriteLine();
            }
 
            if (Trash.Count >= 1)
                CompareFilesRec(Trash);
            Output.Clear();
            Trash.Clear();
            outp.Clear();
        }
    }
}
Posted 3-Nov-12 22:59pm
Je7450
Edited 11-Nov-12 20:42pm
v6
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

if you divide CompareFilesRec in two sections, one for finding candidates and one for comparing.
 
Create a dictionary<long,list>> for keeping file size, filepath. Then you only need to compare the files of the same file size in the second run, so unless you have 1 million files of the same file, that chance of memory exception in your CompareFilesRec is small.
 
You expect the hashcode to be enough to judge if two files are the same. For safety reasons I would compare the contents of the files (you are bringing the files into memory anyway to make a hashcode). I would probably also read them as byte[] instead of strings.
  Permalink  
Comments
Je7 at 4-Nov-12 5:50am
   
Ok, thank you. I'll try.
Je7 at 4-Nov-12 9:35am
   
I don't know why, but similar files, which have identical content have different size. I add value in this way: Sizes.Add(files[i].Length, files[i]); as a result, the key values are different in files. But Windows shows, that size of this files is identical and == 9 bytes.
JesperMadsen123 at 4-Nov-12 10:49am
   
if your code is identical to the posted code, files[] does not contain file contents, but filenames. So files[x].Length should return the length of your filename, not the length of the contents.
Je7 at 4-Nov-12 10:57am
   
I posted second version of code. There is some difference. And for real file size i need to use FileInfo?
JesperMadsen123 at 4-Nov-12 11:37am
   
If you read the entire file into a byte array, or you ask FileInfo for the length, should give the same result. If it doesn't you are probably doing something wrong.
JesperMadsen123 at 4-Nov-12 11:38am
   
Is it possible you read a text file (utf8 or utf16) into a string and calls .Length? Then the length is not the number of bytes, but the number of chars, and there is a difference.
Je7 at 4-Nov-12 14:27pm
   
Thanks a lot. I'll try to realize that.
Je7 at 8-Nov-12 13:41pm
   
Dictionary can't store data with similar keys. And I use the List collection for this: List<long,>> Example = new List<long,string>>();
JesperMadsen123 at 9-Nov-12 13:30pm
   
Why don't you do a Dictionaty<long,list>> then ?? It seems that you have hash collisions or identical files. If you have hash collisions, consider using MD5 or SHA-xxx for the checksum..
Je7 at 12-Nov-12 2:28am
   
I have a collision in keys. Not in hash. Also i realized the search and compare. All is good. But i don't really sure that is optimized. P.S.:does MD5 and SHA are costly in RAM?
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

You could use another thread[^] to search and compare files that are of the same file size, and also show a progress.
  Permalink  
Comments
Je7 at 12-Nov-12 2:33am
   
Threading is very good idea. But can you help me where to insert that threading? I mean: in loops, in comparing, in searching ... etc.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

You could probably make it faster by checking if files are the same size before comparing contents.
 
If files are of equal size, then: instead of reading the entire files in memory at once, allocate two blocks of memory, and reuse those.
 
If you reuse the allocated memory, you should not have problems with "out of memory" exceptions.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Your Filters
Interested
Ignored
     
0 Sergey Alexandrovich Kryukov 685
1 Manas Bhardwaj 380
2 OriginalGriff 344
3 Abhinav S 293
4 Sampath Lokuge 245
0 Sergey Alexandrovich Kryukov 7,967
1 OriginalGriff 4,332
2 Peter Leow 3,699
3 Maciej Los 3,515
4 Er. Puneet Goel 3,107


Advertise | Privacy | Mobile
Web04 | 2.8.140415.2 | Last Updated 12 Nov 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Use
Layout: fixed | fluid