Click here to Skip to main content
15,074,668 members
Articles / Desktop Programming / Win32
Article
Posted 7 Apr 2019

Tagged as

Stats

28.9K views
932 downloads
48 bookmarked

System.IO.Directory Alternative using WinAPI

Rate me:
Please Sign up or sign in to vote.
4.80/5 (37 votes)
1 Mar 2020CPOL4 min read
Faster and better alternative to System.IO.Directory IEnumerable methods EnumerateDirectories, EnumerateFiles and EnumerateFileSystemEntries
While working on a project that required to read contents of a Windows directory, I used the .NET provided System.IO.Directory class' methods. However, there is a big downside to using these functions. This post describes the problem. The solution given is an alternate method using Windows API, the result of which is not only better, but appears to be even faster than the .NET's original methods.

Introduction

Recently, I was working on a project that needed to read the contents of a Windows directory, so I used the .NET provided System.IO.Directory class' EnumerateDirectories, EnumerateFiles and EnumerateFileSystemEntries methods. Unfortunately, there is a big downside to using these functions, and that is that if they run into a file system entry that has denied access to the current user, they immediately break - instead of handling such an error and continuing, they will just return whatever they have gathered up to the moment of breaking - and won't complete the job.

It is impossible to handle this from the outside of the methods because if you handle it, you will get in the returning IEnumerable only partial results.

I have searched everywhere for a solution to this problem, but I was not able to find a workaround that doesn't use the aforementioned methods. So I decided to play around with Windows API and create alternative methods. The result was not only better (in a way that the methods do not break on "Access denied") but it appears to be even faster than the .NET's original methods.

Using the Code

The project itself is a Class Library type, it is not executable but building it will compile the methods into a DLL file, which you can reference into another project, and use it from there like this:

C#
using System.IO;

DirectoryAlternative.EnumerateDirectories
(path, "*", SearchOption.AllDirectories).ToList<string>();

I used the same namespace as the original procedures (System.IO), and named the class DirectoryAlternative - so the usage would be as similar as possible to the original class.

The methods themselves are named the same way, they use the same parameters, and from the outside look absolutely the same as the original ones.

Here is an example of the usage of methods:

C#
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
string path = "V:\\MUSIC";
List<string> en = new List<string>();
sw.Start();
try { en = Directory.EnumerateDirectories
  (path, "*", SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateDirectories : {0} ms / {1} entries", 
  sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateDirectories(path, "*", 
  SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateDirectories : 
  {0} ms / {1} entries", sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
try { en = Directory.EnumerateFiles(path, "*", 
  SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateFiles : {0} ms / {1} entries", 
  sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateFiles
  (path, "*", SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateFiles : {0} ms / {1} entries", 
  sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
try { en = Directory.EnumerateFileSystemEntries
  (path, "*", SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateFileSystemEntries : {0} ms / {1} entries", 
  sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateFileSystemEntries
  (path, "*", SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateFileSystemEntries : {0} ms / {1} entries", 
  sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));

Console.ReadKey();

The above code snippet compares directly the original methods' performance and the DirectoryAlternative methods - I used a very large directory with 70.000+ file system entries:

Image 1

As you can see, the DirectoryAlternative methods run around 50% faster.

How the Code Works

The code uses several Win API functions to move around the file system (I believe these same functions are used in the original .NET methods):

C#
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
struct WIN32_FIND_DATA
{
    public uint dwFileAttributes;
    public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
    public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
    public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
    public uint nFileSizeHigh;
    public uint nFileSizeLow;
    public uint dwReserved0;
    public uint dwReserved1;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
    public string cFileName;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
    public string cAlternateFileName;
}

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern bool FindClose(IntPtr hFindFile);

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern IntPtr FindFirstFile
  (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern bool FindNextFile
  (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

In short:

  • FindFirstFile searches for a first file system entry that it can find using the provided pattern (lpFileName) and returns a HANDLE (IntPtr) to this file
  • FindNextFile searches for the next file system entry that matches the specified pattern - we use this method to go through all the files / directories
  • FindClose is used to close the HANDLE

All the file information is gathered inside the WIN32_FIND_DATA struct and returned as an out type parameter.

For more information on these methods, you can look them up here.

The main method is the method Enumerate. All other methods are wrapped around this one.

C#
private static void Enumerate(string path, string searchPattern, 
        SearchOption searchOption, ref List<string> retValue, EntryType entryType)
{
    WIN32_FIND_DATA findData;
    if (path.Last<char>() != '\\') path += "\\";
    AdjustSearchPattern(ref path, ref searchPattern);
    searchPattern = searchPattern.Replace("*.*", "*");
    Text.RegularExpressions.Regex rx = new Text.RegularExpressions.Regex(
        "^" +
        Text.RegularExpressions.Regex.Escape(path) +
        Text.RegularExpressions.Regex.Escape(searchPattern)
        .Replace("\\*", ".*")
        .Replace("\\?", ".")
        + "$"
        , Text.RegularExpressions.RegexOptions.IgnoreCase);
    IntPtr hFile = FindFirstFile(path + "*", out findData);
    List<string> subDirs = new List<string>();
    if (hFile.ToInt32() != -1)
    {
        do
        {
            if (findData.cFileName == "." || findData.cFileName == "..") continue;
            if ((findData.dwFileAttributes & 
               (uint)FileAttributes.Directory) == (uint)FileAttributes.Directory)
            {
                subDirs.Add(path + findData.cFileName);
                if ((entryType == EntryType.Directories || 
                     entryType == EntryType.All) && rx.IsMatch(path + findData.cFileName)) 
                     retValue.Add(path + findData.cFileName);
            }
            else
            {
                if ((entryType == EntryType.Files || 
                     entryType == EntryType.All) && rx.IsMatch(path + findData.cFileName)) 
                     retValue.Add(path + findData.cFileName);
            }
        } while (FindNextFile(hFile, out findData));
        if (searchOption == SearchOption.AllDirectories)
            foreach (string subdir in subDirs)
                Enumerate(subdir, searchPattern, searchOption, ref retValue, entryType);
    }
    FindClose(hFile);
}

The method takes all the parameters from the original Enumerate methods (path, searchPattern, searchOption) plus a by-reference argument retValue, and entryType, which is an enum:

C#
private enum EntryType { All = 0, Directories = 1, Files = 2 };

This enum serves as a selector whether only directories, only files or both should be returned.

The Enumerate method calls FindFirstFile and subsequently iterates through all other file system entries by calling FindNextFile. If entryType = Files, it will add all the files in the retValue list. For Directories, it will add only directories, and for All it will add both.

The method always searches for all the filesystem entries (searchOption = "*"), and a Regex (regular expression) object takes care of filtering which files and/or folders should actually be returned. This was an upgrade to the first version of the method which was supplying the path + searchPattern arguments to the FindFirstFile API function, unfortunately this proved to work only for *.* searches and not for specific files (ie. *.jpg) with searchOption = AllDirectories because there were no subfolders that matched this search pattern, and hence only the files from the top directory were getting returned.

The Enumerate method is called recursively if searchOption = AllDirectories. The results from all the (recursive) calls are gathered in one variable retValue. I used a by-ref argument to pass the results from each recursive call because returning and concatenating list has proven to be very very slow - however, if anyone prefers to use the List return type from Enumerate method, List.AddRange method works equally fast.

In the end, each call's HANDLE for file search will be closed by calling method FindClose.

Testing the code

I have created a small piece of code (Console app) for testing purposes - comparison to the System.IO.Directory methods (.NET standard version):

C#
static void Main(string[] args)
        {
            Stopwatch sw = new Stopwatch();

            string path = @"V:\MUSIC";
            List<string> searchPatterns = new List<string>();
            searchPatterns.Add("*.*");
            searchPatterns.Add("*.mp3");
            searchPatterns.Add("*.jpg");
            searchPatterns.Add("Iron*");
            searchPatterns.Add("Iron Maiden\\*.mp?");
            searchPatterns.Add("IRON MAIDEN");
            searchPatterns.Add("Iron Maiden\\*.jp?g");

            List<SearchOption> searchOptions = new List<SearchOption>();
            searchOptions.Add(SearchOption.AllDirectories);
            searchOptions.Add(SearchOption.TopDirectoryOnly);

            List<Func<string, string, SearchOption, IEnumerable<string>>> funcs = 
                 new List<Func<string, string, SearchOption, IEnumerable<string>>>();
            funcs.Add(DirectoryAlternative.EnumerateFiles);
            funcs.Add(Directory.EnumerateFiles);
            funcs.Add(DirectoryAlternative.EnumerateDirectories);
            funcs.Add(Directory.EnumerateDirectories);
            funcs.Add(DirectoryAlternative.EnumerateFileSystemEntries);
            funcs.Add(Directory.EnumerateFileSystemEntries);

            IEnumerable<string> list;
            int cnt;
            System.Reflection.MethodInfo mi;
            Console.WriteLine("METHOD              MODULE                        SEARCHPATTERN       SEARCHOPTION        TIME                COUNT");
            Console.WriteLine("=====================================================================================================================");
            foreach (string searchPattern in searchPatterns)
            {
                foreach (SearchOption searchOption in searchOptions)
                {
                    foreach (Func<string, string, SearchOption, IEnumerable<string>> 
                                  func in funcs)
                    {
                        sw.Restart();
                        list = func(path, searchPattern, searchOption);
                        cnt = list.Count();
                        sw.Stop();
                        mi = System.Reflection.RuntimeReflectionExtensions.GetMethodInfo(func);
                        Console.WriteLine(Wrap(mi.Name, 19) + " "
                            + Wrap(mi.Module.Name, 29) + " "
                            + Wrap(searchPattern, 19) + " "
                            + Wrap(searchOption == SearchOption.TopDirectoryOnly ? 
                              "root" : "all", 19) + " "
                            + Wrap(sw.ElapsedMilliseconds.ToString("N0") + "ms", 19) + " "
                            + cnt.ToString());
                        Console.ReadKey();
                    }
                }
            }

            Console.WriteLine();
            Console.WriteLine("THE END!!!");
            Console.ReadKey();
        }

        static string Wrap(string str, int len)
        {
            if (str.Length > len)
                return "..." + str.Substring(str.Length - len + 3, len - 3);
            else
                return str.PadRight(len);
        }

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Marijan Nikic
User Interface Analyst Raiffeisenbank Austria
Croatia Croatia
I acquired Masters degree in computing science at the Faculty of Electrical Engineering and Computing in Zagreb, Croatia in 2009. Following my studies, I got a job in a Croatian branch of Austrian-based CEE Raiffeisen Bank as an MIS (Management information system) analyst.
I have been working there since 2010, as an IT expert within the Controlling department, maintaining the Oracle's OFSA system, underlying interfaces and databases.
Throughout that time, I have worked with several different technologies, which include SQL & PL/SQL (mostly), postgres, Cognos BI, Apparo, Datastage, ODI, Jenkins, Qlik, ...
I am doing a lot of automation with scripting in batch / shell and VBscript (mostly) - data analysis and processing, automated DB imports and exports, Jenkins automation etc.
Privately, I was mostly doing Windows Forms and Console app tools in Visual Studio, C#.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Bogatitus21-Apr-20 20:19
MemberBogatitus21-Apr-20 20:19 
QuestionMultiple Extension Support? Pin
Bogatitus21-Apr-20 11:23
MemberBogatitus21-Apr-20 11:23 
AnswerRe: Multiple Extension Support? Pin
Bogatitus21-Apr-20 12:06
MemberBogatitus21-Apr-20 12:06 
QuestionAwesome! Pin
onelopez2-Mar-20 18:53
Memberonelopez2-Mar-20 18:53 
QuestionDont get it Pin
Andreas Saurwein2-Mar-20 1:05
MemberAndreas Saurwein2-Mar-20 1:05 
AnswerRe: Dont get it Pin
Andreas Saurwein2-Mar-20 1:14
MemberAndreas Saurwein2-Mar-20 1:14 
QuestionMax path length: 256 or 32768 ? Pin
LightTempler1-Mar-20 13:48
MemberLightTempler1-Mar-20 13:48 
QuestionSystem.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
fuujinn7-Jan-20 3:27
Memberfuujinn7-Jan-20 3:27 
AnswerRe: System.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
Marijan Nikic27-Jan-20 8:46
mvaMarijan Nikic27-Jan-20 8:46 
GeneralRe: System.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
fuujinn29-Jan-20 20:17
Memberfuujinn29-Jan-20 20:17 
GeneralRe: System.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
fuujinn23-Feb-20 1:00
Memberfuujinn23-Feb-20 1:00 
GeneralRe: System.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
fuujinn23-Feb-20 1:55
Memberfuujinn23-Feb-20 1:55 
GeneralRe: System.OverflowException: 'Arithmetic operation resulted in an overflow.' Pin
Marijan Nikic1-Mar-20 0:03
mvaMarijan Nikic1-Mar-20 0:03 
QuestionNot clear how 'Access Denied' is avoided Pin
mldisibio2-Jan-20 6:43
Membermldisibio2-Jan-20 6:43 
AnswerRe: Not clear how 'Access Denied' is avoided Pin
Marijan Nikic27-Jan-20 8:36
mvaMarijan Nikic27-Jan-20 8:36 
GeneralRe: Not clear how 'Access Denied' is avoided Pin
mldisibio27-Jan-20 8:41
Membermldisibio27-Jan-20 8:41 
GeneralPerformance II Pin
snoopy0011-Jan-20 23:51
Membersnoopy0011-Jan-20 23:51 
QuestionPerformance Pin
mmedic1-Jan-20 10:41
Membermmedic1-Jan-20 10:41 
AnswerRe: Performance Pin
Marijan Nikic27-Jan-20 8:42
mvaMarijan Nikic27-Jan-20 8:42 
Question.Net Core is Open Source Pin
Paul M Cohen31-Dec-19 8:49
MemberPaul M Cohen31-Dec-19 8:49 
QuestionWell done! Pin
v_wheeler31-Dec-19 8:29
professionalv_wheeler31-Dec-19 8:29 
PraiseHaven't tried it yet, but looking the code over, looks very solid. Pin
Member 1225072331-Dec-19 7:13
MemberMember 1225072331-Dec-19 7:13 
GeneralMy vote of 5 Pin
LightTempler31-Dec-19 4:54
MemberLightTempler31-Dec-19 4:54 
GeneralMy vote of 5 Pin
snoopy00131-Dec-19 3:45
Membersnoopy00131-Dec-19 3:45 
PraiseMy Vote of 4 Pin
Stylianos Polychroniadis30-Dec-19 12:21
MemberStylianos Polychroniadis30-Dec-19 12:21 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.