Click here to Skip to main content
15,884,986 members
Articles / Programming Languages / C#

A Faster Directory Enumerator

Rate me:
Please Sign up or sign in to vote.
4.91/5 (90 votes)
27 Aug 2009CPOL3 min read 356.7K   25.2K   170   93
Describes how to create a significantly faster enumerator for the attributes of all the files in a directory.

Image 1

Introduction

The .NET Framework's Directory class includes methods for querying the list of files in a directory. For each file, you can also then query the attributes of the file, such as the size, creation date, etc. However, when querying files on a remote PC, this can be very inefficient; a potentially expensive network round-trip is needed to retrieve each file's attributes. This article describes a much more efficient implementation that is approximately 3x faster.

Background

Let's assume you are writing an application that needs to find the most recently modified file in a directory. To implement this, you might have a function similar to the following:

C#
DateTime GetLastFileModifiedSlow(string dir)
{
    DateTime retval = DateTime.MinValue;
    
    string [] files = Directory.GetFiles(dir);
    for (int i=0; i<files.Length; i++)
    {
        DateTime lastWriteTime = File.GetLastWriteTime(files[i]);
        if (lastWriteTime > retval)
        {
            retval = lastWriteTime;
        }
    }
    
    return retval;
}

That function certainly works, but it suffers from some very poor performance characteristics:

  1. GetFiles must allocate a potentially very large array.
  2. GetFiles must wait for the entire directory's entries to be returned before returning.
  3. For each file, a potentially expensive query is sent to the file system. No attempt is made to perform any sort of batch query.

You might think that converting to DirectoryInfo.GetFileSystemInfos would improve item #3:

C#
DateTime GetLastFileModifiedSlow2(string dir)
{
    DateTime retval = DateTime.MinValue;
    
    DirectoryInfo dirInfo = new DirectoryInfo(dir);

    FileInfo[] files = dirInfo.GetFiles();
    for (int i=0; i<files.Length; i++)
    {
        if (files[i].LastWriteTime > retval)
        {
            retval = lastWriteTime;
        }
    }
    
    return retval;
}

This doesn't change anything however: the objects returned by GetFiles() are not initialized with any data, and will all query the file system the first time any property is accessed.

Making it Faster

The attached test application includes the FastDirectoryEnumerator class in FastDirectoryEnumerator.cs. Using the GetFiles method, we can write the equivalent of our first slow method.

C#
DateTime GetLastFileModifiedFast(string dir)
{
    DateTime retval = DateTime.MinValue;
    
    FileData [] files = FastDirectoryEnumerator.GetFiles(dir);
    for (int i=0; i<files.Length; i++)
    {
        if (files[i].LastWriteTime > retval)
        {
            retval = lastWriteTime;
        }
    }
    
    return retval;
}

The FileData object provides all the standard attributes for a file that the FileInfo class provides.

Making it Even Faster

Use one of the overloads of the EnumerateFiles method to enumerate over all the files in a directory. The enumeration returns a FileData object.

Below is an example of the same method using FastDirectoryEnumerator:

C#
DateTime GetLastFileModifiedFast(string dir)
{
    DateTime retval = DateTime.MinValue;

    foreach (FileData f in FastDirectoryEnumerator.EnumerateFiles(dir))
    {
        if (f.LastWriteTime > retval)
        {
            retval = f.LastWriteTime;
        }
    }

    return retval;
}

Performance

The test application allows you to create a large number of files in a directory, then test the time it takes to enumerate using all three methods. I used a directory with 3000 files and ran each test three times to give the best answer possible for each test.

Using a path on my local hard drive resulted in the following times:

  • Directory.GetFiles method: ~225ms
  • DirectoryInfo.GetFiles method: ~230ms
  • FastDirectoryEnumerator.GetFiles method: ~33ms
  • FastDirectoryEnumerator.EnumerateFiles method: ~27ms

That is roughly a 8.5x increase in performance between the fastest and the slowest methods. The performance is even more pronounced when the files are on a UNC path. For this test, I used the same directory as the previous test. The only difference is that I referenced the directory by a UNC share name instead of the local path. At the time of the test, I was connected to my home wireless network.

  • Directory.GetFiles method: ~43,860ms
  • DirectoryInfo.GetFiles method: ~44,000ms
  • FastDirectoryEnumerator.GetFiles method: ~55ms
  • FastDirectoryEnumerator.EnumerateFiles method: ~53ms

That is roughly a 830x increase in performance, and more than 2 orders of magnitude! And, the gap only increases as the latency to the PC containing the files increases.

Why is it Faster?

As mentioned above, Directory.GetFiles and DirectoryInfo.GetFiles have a number of disadvantages. The most significant is that they throw away information and do not efficiently allow you to retrieve information about multiple files at the same time.

Internally, Directory.GetFiles is implemented as a wrapper over the Win32 FindFirstFile/FindNextFile functions. These functions all return information about each file that is enumerated that the GetFiles() method throws away when it returns the file names. They also retrieve information about multiple files with a single network message.

The FastDirectoryEnumerator keeps this information and returns it in the FileData class. This substantially reduces the number of network round-trips needed to accomplish the same task.

History

  • 8-13-2009: Initial version.
  • 8-14-2009: Added security checks, parameter checking, and the GetFiles method.
  • 8-24-2009: Fixed the AllDirectories search using GetFiles. Removed note about .NET 4.0 including something similar.
  • 9-08-2009: Fixed the AllDirectories search when filter is not * or *.*.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
I've been a software engineer since 1999. I tend to focus on C# and .NET technologies when possible.

Comments and Discussions

 
QuestionEnumerateFiles also search 8.3, hope provide option to ignore 8.3 Pin
ahdung16-Jun-22 22:49
ahdung16-Jun-22 22:49 
GeneralMy vote of 5 Pin
ahdung25-May-22 23:28
ahdung25-May-22 23:28 
GeneralMy vote of 5 Pin
ahdung25-Nov-21 15:42
ahdung25-Nov-21 15:42 
QuestionLong path support Pin
Sparked198319-May-19 22:04
Sparked198319-May-19 22:04 
GeneralMy vote of 5 Pin
Nelson Lainez12-Mar-19 12:15
Nelson Lainez12-Mar-19 12:15 
SuggestionGreat work! A couple of bugs. Pin
MrGneissGuy10-Jan-19 15:21
MrGneissGuy10-Jan-19 15:21 
First I must thank you. You did an amazing job and saved me a heck of a lot of time and I learned new things!

So I was testing this out by enumerating my entire system drive (~500k files), and I was getting stack overflow exceptions after <5000 files. I discovered it overflowing on C:\Windows\WinSxS which contained ~20k directories. After reading your code and MSKB for a bit, I noticed that FindNextFile can't be made to return only files, so the Enumerator was checking every single result to see if it was a file, and if it wasn't the Directory was discarded through another level of recursion. Since there's no need for recursion to traverse siblings, I replaced the recursion in the DirectoryDiscarder with an iterator:

C#
if (((FileAttributes)m_win_find_data.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory)
       {
           return MoveNext();
       }

...with an iterator:
C#
while (((FileAttributes)m_win_find_data.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory && retval)
     {
        retval = FindNextFile(m_hndFindFile, m_win_find_data);
     }
     if (retval) return retval;
     return MoveNext();


It worked like a charm, and I was able to get through my whole drive in about 40 seconds. Then I thought, why are we discarding those directories we found? We're gonna need them when we descend to the next level. So I added a string stack in the constructor.
C#
private Stack<string> m_wasteDirectories;

and collected them from inside the DirectoryDiscarder. Then I replaced the System.IO.Directory.GetDirectories code with:
C#
m_currentContext.SubdirectoriesToProcess = m_wasteDirectories;

... and re-initialized after every FindFirstFile. It worked like a charm! But I soon discovered that FindFirstFile and FindNextFile give you directories with relative paths, which started up the stack overflows again. So I just filtered them out:
C#
if (m_win_find_data.cFileName != "." && m_win_find_data.cFileName != "..")
     {
         m_wasteDirectories.Push(Path.Combine(m_path, m_win_find_data.cFileName));
     }

...and the results were identical to those that GetDirectories returned.


One last bug I discovered when I didn't have permission to traverse some system directories. I didn't want to require users to run as admin, so I just wanted to skip over those and ignore them. By that was trickier than it sounded. There was no way for the Enumerator to recover if FindFirstFile returned an invalid handle. So after some tinkering and sketching, I came up with the flow below which did the trick.
C#
                          [...]

            m_hndFindFile = FindFirstFileEx(...)


           retval = !m_hndFindFile.IsInvalid;
       }
       else
       {
           ...
       }
   }

   if (retval)
   {
      //DirectoryDestroyer
   }

   else if (m_searchOption == SearchOption.AllDirectories)
   {
       if (retval || m_hndFindFile == null || !m_hndFindFile.IsInvalid)
       {
           if (m_currentContext.SubdirectoriesToProcess == null)
           {
               ...
           }

           if (m_currentContext.SubdirectoriesToProcess.Count > 0)
           {
              ...
           }
       }

       if (m_contextStack.Count > 0)
       {
          ...
       }
   }
   return retval;
}

After these changes, I brought the traverse time of my drive from a mean of 41.1s down to 23.2s (5 trials each, S.D. of 0.21s on both versions).

I hope this helps!
GeneralRe: Great work! A couple of bugs. Pin
radiolondra1-Feb-19 6:41
radiolondra1-Feb-19 6:41 
QuestionIs there a cross platform version that that uses .net core? Pin
Liran Friedman3-Jan-19 4:51
Liran Friedman3-Jan-19 4:51 
GeneralSystem.StackOverflowException issue Pin
peterhuang1kimo12-Aug-18 22:59
peterhuang1kimo12-Aug-18 22:59 
GeneralRe: System.StackOverflowException issue Pin
musherum.online10-Jan-19 6:37
musherum.online10-Jan-19 6:37 
GeneralMy vote of 5 Pin
jonkeda20-Mar-18 0:43
jonkeda20-Mar-18 0:43 
QuestionUpdated version available Pin
opulos15-Apr-17 5:18
opulos15-Apr-17 5:18 
AnswerRe: Updated version available Pin
musherum.online10-Jan-19 7:16
musherum.online10-Jan-19 7:16 
QuestionThis is still relevant with NET 4 Pin
TimFlan1-Apr-16 6:21
TimFlan1-Apr-16 6:21 
AnswerRe: This is still relevant with NET 4 Pin
DHitchenUK24-May-16 22:37
DHitchenUK24-May-16 22:37 
SuggestionUsing Parallel Pin
Khayralla3-Jul-15 6:57
Khayralla3-Jul-15 6:57 
GeneralRe: Using Parallel Pin
Member 1117508327-Jun-18 5:04
Member 1117508327-Jun-18 5:04 
GeneralIn .Net 4.0 and above this has been solved Pin
ilanc518-Jan-15 10:19
ilanc518-Jan-15 10:19 
GeneralRe: In .Net 4.0 and above this has been solved Pin
Member 1126915023-Feb-15 13:54
Member 1126915023-Feb-15 13:54 
GeneralMy vote of 5 Pin
opulos24-Jul-14 23:13
opulos24-Jul-14 23:13 
QuestionDoes this work in Non-Windows operating systems? Pin
Vexe Elxenoanizd19-May-14 3:12
Vexe Elxenoanizd19-May-14 3:12 
AnswerRe: Does this work in Non-Windows operating systems? Pin
wilsone819-May-14 6:45
wilsone819-May-14 6:45 
GeneralRe: Does this work in Non-Windows operating systems? Pin
Vexe Elxenoanizd19-May-14 8:32
Vexe Elxenoanizd19-May-14 8:32 
QuestionProblem with long file path. Pin
tinnguyen7-May-14 22:09
tinnguyen7-May-14 22:09 
QuestionIssues with getDirectories on MoveNext() Pin
nserrano11-Mar-14 8:46
nserrano11-Mar-14 8:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.