Click here to Skip to main content
15,885,757 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hi everybody! My program takes argument. This argument is a directory, and program will find all files with the same content, group that files and print on screen. With small amount of files, program works good and fast, but, for example, i take my flash-card for searching and compearing(or more dramatic - the C:\ directory), program gradually increase RAM using, until to 500Mb, and write this exception. I post my code, and want to listen the critic and help.

Here the comparing function:
C#
static void CompareFilesRec(List<string> array)
{
    List<string> Trash = new List<string>();
    List<string> Output = new List<string>();
    List<KeyValuePair<long, string>> yeah = new List<KeyValuePair<long, string>>();
    List<string> outp = new List<string>();
    //filling the list with sizes of files(as a key) and paths of files
    for (int j = 0; j <= array.Count - 1; j++)
    {
        FileInfo fii = new FileInfo(array[j]);
        yeah.Add(new KeyValuePair<long, string>(fii.Length, array[j]));
    }
    //filtering the previous list. Move files with another sizes to trash(for processing in the end)
    foreach (var el in yeah)
    {
        if (!Output.Contains(el.Value) && !Trash.Contains(el.Value))
        {
            foreach (var ele in yeah)
            {
                if (el.Key == ele.Key && !Output.Contains(ele.Value))
                {
                    Output.Add(ele.Value);
                }
                else
                {
                    if (!Trash.Contains(ele.Value))
                    {
                        Trash.Add(ele.Value);
                    }
                }
            }
        }
    }
    //Comparing the files(i take the first element in list, and compare with other files)
    foreach (string f in Output)
    {
        int valueOne = 0, valueTwo = 0;
        StreamReader objReader = new StreamReader(f);
        valueOne = objReader.ReadToEnd().GetHashCode();

        StreamReader objReader1 = new StreamReader(Output[0]);
        valueTwo = objReader1.ReadToEnd().GetHashCode();

        if (f != Output[0] && valueOne == valueTwo)
        {
            outp.Add(f);
        }
        if (valueOne != valueTwo)
        {
            Trash.Add(f);
        }

    }
    outp.Add(Output[0]);
    //output of the list
    if (outp.Count > 1)
    {
        foreach (string fi in outp)
        {
            Console.WriteLine(fi);
        }
        outp.Clear();
        Console.WriteLine();
    }
    //recursive
    if (Trash.Count >= 1)
        CompareFilesRec(Trash);
    Output.Clear();
    Trash.Clear();
    outp.Clear();
}

Code:
C#
using System;
using System.Collections.Generic;
using System.IO;

namespace ConsoleApplication3
{
    class Comparer
    {
        static void Main(string[] args)
        {
            if (args.Length != 0)//перевірка чи є хоча б 1 аргумент
            {
                if (args.Length == 1)//перевірка чи є тільки 1 аргумент
                {
                    List<string> ListOfFiles = new List<string>();//список для файлів

                    ListOfFiles.AddRange(LookIn(args[0]));//отримуєм список файлів
                    Console.WriteLine("Found {0} files", ListOfFiles.Count);//вивести скільки файлів знайдено

                    if(ListOfFiles.Count > 0)
                       CompareFilesRec(ListOfFiles);
                }
                else Console.WriteLine("So many arguments... o_O");//повідомлення, якщо аргументів забагато
            }
            else Console.WriteLine("Write some stuff here!");//повідомлення якщо немає аргументів
        }

        static bool CheckFileLessThan2Gb(string file)//функція перевірки чи файл менший за 2Гб.
        {
            FileInfo someFileInfo = new FileInfo(file);//берем інфу про файл в змінну someFileInfo
            if (someFileInfo.Length <= 2147483648)//перевірка
                return true;//якщо підходить
            else return false;//як не підходить 
        }

        static List<string> LookIn(string path)
        {
            /*Ініціалізую і об*являю два списки:для файлів і папок*/
            List<string> files = new List<string>();
            List<string> dirs = new List<string>();
            
            /*Шукаєм всі доступні файли*/
            try
            {
                /*Добавляю знайдені папки і файли в список*/
                files.AddRange(Directory.GetFiles(path));
                dirs.AddRange(Directory.GetDirectories(path));
                
                for (int i = 0; i <= files.Count; i++)
                {
                    if (!CheckFileLessThan2Gb(files[i]))
                        files.RemoveAt(i);
                    try
                    {
                        string FileValue = null;
                        StreamReader Value = new StreamReader(files[i]);
                        FileValue = Value.ReadToEnd();
                        if (FileValue.Length == 0)
                            files.RemoveAt(i);
                        Value.Dispose();
                        Value.Close();
                    }
                    catch (System.IO.IOException) { }
                }
            }
            catch (UnauthorizedAccessException) { }
            catch (DirectoryNotFoundException) { }
            catch (ArgumentOutOfRangeException) { }

            
            /*"Заглядаєм" за файлами в кожну директорію...*/
            foreach (string dir in dirs)
            {
                files.AddRange(LookIn(dir));//...і додаєм до списку
            }
            return files;//повертаєм повний список знайдених файлів
        }

        static void CompareFilesRec(List<string> array)
        {
            List<string> Trash = new List<string>();
            List<string> Output = new List<string>();
            List<KeyValuePair<long, string>> yeah = new List<KeyValuePair<long, string>>();
            List<string> outp = new List<string>();

            for (int j = 0; j <= array.Count - 1; j++)
            {
                FileInfo fii = new FileInfo(array[j]);
                yeah.Add(new KeyValuePair<long, string>(fii.Length, array[j]));
            }

            foreach (var el in yeah)
            {
                if (!Output.Contains(el.Value) && !Trash.Contains(el.Value))
                {
                    foreach (var ele in yeah)
                    {
                        if (el.Key == ele.Key && !Output.Contains(ele.Value))
                        {
                            Output.Add(ele.Value);
                        }
                        else
                        {
                            if (!Trash.Contains(ele.Value))
                            {
                                Trash.Add(ele.Value);
                            }
                        }
                    }
                }
            }

            foreach (string f in Output)
            {
                int valueOne = 0, valueTwo = 0;
                StreamReader objReader = new StreamReader(f);
                valueOne = objReader.ReadToEnd().GetHashCode();

                StreamReader objReader1 = new StreamReader(Output[0]);
                valueTwo = objReader1.ReadToEnd().GetHashCode();

                if (f != Output[0] && valueOne == valueTwo)
                {
                    outp.Add(f);
                }
                if (valueOne != valueTwo)
                {
                    Trash.Add(f);
                }

            }
            outp.Add(Output[0]);
            if (outp.Count > 1)
            {
                foreach (string fi in outp)
                {
                    Console.WriteLine(fi);
                }
                outp.Clear();
                Console.WriteLine();
            }

            if (Trash.Count >= 1)
                CompareFilesRec(Trash);
            Output.Clear();
            Trash.Clear();
            outp.Clear();
        }
    }
}
Posted
Updated 11-Nov-12 20:42pm
v6

if you divide CompareFilesRec in two sections, one for finding candidates and one for comparing.

Create a dictionary<long,list><string>> for keeping file size, filepath. Then you only need to compare the files of the same file size in the second run, so unless you have 1 million files of the same file, that chance of memory exception in your CompareFilesRec is small.

You expect the hashcode to be enough to judge if two files are the same. For safety reasons I would compare the contents of the files (you are bringing the files into memory anyway to make a hashcode). I would probably also read them as byte[] instead of strings.
 
Share this answer
 
Comments
Je7 4-Nov-12 5:50am    
Ok, thank you. I'll try.
Je7 4-Nov-12 9:35am    
I don't know why, but similar files, which have identical content have different size. I add value in this way: Sizes.Add(files[i].Length, files[i]); as a result, the key values are different in files. But Windows shows, that size of this files is identical and == 9 bytes.
JesperMadsen123 4-Nov-12 10:49am    
if your code is identical to the posted code, files[] does not contain file contents, but filenames. So files[x].Length should return the length of your filename, not the length of the contents.
Je7 4-Nov-12 10:57am    
I posted second version of code. There is some difference. And for real file size i need to use FileInfo?
JesperMadsen123 4-Nov-12 11:37am    
If you read the entire file into a byte array, or you ask FileInfo for the length, should give the same result. If it doesn't you are probably doing something wrong.
You could use another thread[^] to search and compare files that are of the same file size, and also show a progress.
 
Share this answer
 
Comments
Je7 12-Nov-12 2:33am    
Threading is very good idea. But can you help me where to insert that threading? I mean: in loops, in comparing, in searching ... etc.
You could probably make it faster by checking if files are the same size before comparing contents.

If files are of equal size, then: instead of reading the entire files in memory at once, allocate two blocks of memory, and reuse those.

If you reuse the allocated memory, you should not have problems with "out of memory" exceptions.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900