|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionMany programmers have come across the problem of finding files matching a certain specification that exist in one or more directories or subdirectories. This article demonstrates a technique of enumerating files that integrates well with .NET programming idioms and that takes advantage of some of the new features of C# 2.0. The result is a class that's very easy to use and is more efficient (in time and space) than many other solutions. BackgroundOn Windows, finding files is done by calling the Win32 functions Unfortunately, many real-world programs need more than what The .NET Framework, starting with version 1.0, provides a more abstract interface for enumerating file names in directories in the form of
A better solutionWithout getting into the class design, imagine that you could write code like: using System.IO;
// ...
foreach (FileInfo fi in SomeObject.SomeMethod())
{
// ... do something with this match
}
Here, we're taking advantage of the C# compiler's built in support for .NET's This model offers a number of advantages:
Lazy evaluation refers to a technique that puts off doing work until the last possible moment. In this case, scanning directories for files can be delayed as long as possible. An implementation can delay calls to The FileSystemEnumerator classThe using System;
using System.Collections.Generic;
using System.IO;
/// <summary>
/// File system enumerator. This class provides an easy to use,
/// efficient mechanism for searching a list of
/// directories for files matching a list of file
/// specifications. The search is done incrementally as matches
/// are consumed, so the overhead before processing
/// the first match is always kept to a minimum.
/// </summary>
public sealed class FileSystemEnumerator : IDisposable
{
/// <summary>
/// IDisposable.Dispose
/// </summary>
public void Dispose()
{
// ...
}
/// <summary>
/// Constructor.
/// </summary>
/// <param name="pathsToSearch">Semicolon- or
/// comma-delimitted list of paths to search.</param>
/// <param name="fileTypesToMatch">Semicolon- or
/// comma-delimitted list of wildcard filespecs to match.</param>
/// <param name="includeSubDirs">If true,
/// subdirectories are searched.</param>
public FileSystemEnumerator(string pathsToSearch,
string fileTypesToMatch, bool includeSubDirs)
{
// ...
}
/// <summary>
/// Get an enumerator that returns all
/// of the files that match the wildcards that
/// are in any of the directories to be searched.
/// </summary>
/// <returns>An IEnumerable that returns
/// all matching files one by one.</returns>
/// <remarks>The enumerator that is returned
/// finds files using a lazy algorithm that
/// searches directories incrementally
/// as matches are consumed.</remarks>
public IEnumerable<FileInfo> Matches()
{
// ...
}
}
Using the classUsing the string pathsToSearch = "your paths here, separated by semicolons";
string filesToMatch =
"your file specifications here, separated by semicolons";
bool includeSubDirectories = true; // or false if you'd prefer
using (FileSystemEnumerator fse = new
FileSystemEnumerator(pathsToSearch,
filesToMatch,includeSubdirectories))
{
foreach (FileInfo fi in fse.Matches())
{
// Do something with fi
}
}
The demo projectThe included demo project is a simple Windows Forms (VS 2005) project that does directory listings using the
About the implementationThe implementation of Interfacing with Win32To get information about the files, we still need to use namespace Win32
{
/// <summary>
/// Structure that maps to WIN32_FIND_DATA
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
class FindData
{
// ... other members
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public String fileName;
// ... other members
}
/// <summary>
/// SafeHandle class for holding find handles
/// </summary>
class SafeFindHandle :
Microsoft.Win32.SafeHandles.SafeHandleMinusOneIsInvalid
{
/// <summary>
/// Constructor
/// </summary>
public SafeFindHandle() { ... }
/// <summary>
/// Release the find handle
/// </summary>
/// <returns>true if the handle was released</returns>
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
protected override bool ReleaseHandle() { ... }
}
/// <summary>
/// Wrapper for P/Invoke methods used by FileSystemEnumerator
/// </summary>
[SecurityPermissionAttribute(SecurityAction.Demand,
UnmanagedCode = true)]
static class SafeNativeMethods
{
[DllImport("Kernel32.dll", CharSet = CharSet.Auto)]
public static extern SafeFindHandle
FindFirstFile(String fileName,
[In, Out] FindData findFileData);
[DllImport("kernel32", CharSet = CharSet.Auto)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool FindNextFile(SafeFindHandle hFindFile,
[In, Out] FindData lpFindFileData);
[DllImport("kernel32", CharSet = CharSet.Auto)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool FindClose(IntPtr hFindFile);
}
}
In this namespace, you'll find three components:
FindData and SafeNativeMethodsThese two components are a straightforward P/Invoke wrapper of the SafeFindHandleThe Interfacing with the programmerThe programmer deals with private bool m_includeSubDirs;
private string[] m_paths;
private List<Regex> m_fileSpecs;
// ...
public FileSystemEnumerator(string pathsToSearch,
string fileTypesToMatch, bool includeSubDirs)
{
// ...
m_includeSubDirs = includeSubDirs;
m_paths =
pathsToSearch.Split(new char[] { ';', ',' });
string[] specs =
fileTypesToMatch.Split(new char[] { ';', ',' });
m_fileSpecs = new List<Regex>(specs.Length);
foreach (string spec in specs)
{
// trim whitespace off file spec and convert
// Win32 wildcards to regular expressions
string pattern = spec
.Trim()
.Replace(".", @"\.")
.Replace("*", @".*")
.Replace("?", @".?")
;
m_fileSpecs.Add(new Regex("^" + pattern +
"$", RegexOptions.IgnoreCase));
}
}
Here, the The parameter Finding files recursively without using recursionOf course, the real meat of public IEnumerable<FileInfo> Matches()
{
foreach (string rootPath in m_paths)
{
string path = rootPath.Trim();
// .. search "path" for files matching
// any of the regular expressions in m_fileSpecs
}
}
Pretty simple so far - we iterate over the root paths the caller supplied, and look for files in each one. Let's dig into the code that goes in that loop: // find all the files in 'path'
// that match any of the regular expressions
Win32.FindData findData = new Win32.FindData();
Win32.SafeFindHandle handle =
Win32.SafeNativeMethods.FindFirstFile(
Path.Combine(path, "*"), findData);
if (!handle.IsInvalid)
{
do
{
// don't match . or ..
if (findData.fileName.Equals(@".") ||
findData.fileName.Equals(@".."))
continue;
if ((findData.fileAttributes &
(int)FileAttributes.Directory) != 0)
{
if (m_includeSubDirs)
{
// it's a directory - recurse into it
// ... we'll come back to this
}
}
else
{
// it's a file, see if any of the filespecs matches it
foreach (Regex fileSpec in m_fileSpecs)
{
// if this spec matches, return this file's info
if (fileSpec.IsMatch(findData.fileName))
yield return new
FileInfo(Path.Combine(path, findData.fileName));
}
}
} while (Win32.SafeNativeMethods.FindNextFile(
handle, findData));
// close this find handle
handle.Close();
}
Ignoring the fact that we're not yet handling subdirectories, this is a typical find/next/close loop that you'll see in any discussion of listing files on Windows operating systems. A couple of things are worth noting at this point:
Now, a typical recursive file find routine (such as the example from pinvoke.net), would simply call itself recursively to find files in subdirectories. Unfortunately, it's not possible to write a recursive function that uses All is not lost though. In data structures and algorithms classes, we're taught that recursion can always be replaced by iteration (interesting, the reverse is also true). How do you convert recursion to iteration? It's simple - you identify the variables that are unique to each activation of the recursive function, and build your own stack to store those variables: /// <summary>
/// Information that's kept in our
/// stack for simulated recursion
/// </summary>
private struct SearchInfo
{
/// <summary>
/// Find handle returned by FindFirstFile
/// </summary>
public Win32.SafeFindHandle Handle;
/// <summary>
/// Path that was searched to yield the find handle.
/// </summary>
public string Path;
/// <summary>
/// Constructor
/// </summary>
/// <param name="h">Find handle returned by FindFirstFile.</param>
/// <param name="p">Path corresponding to find handle.</param>
public SearchInfo(Win32.SafeFindHandle h, string p)
{
Handle = h;
Path = p;
}
}
/// <summary>
/// Stack of open scopes.
/// This is a member (instead of a local variable)
/// to allow Dispose to close any open
/// find handles if the object is disposed
/// before the enumeration is completed.
/// </summary>
private Stack<SearchInfo> m_scopes;
Here, we have a simple structure that holds two pieces of information that are unique to each "recursion": the full path being searched and the find handle for that path. We maintain a stack of these structures using the new We implement our simulated recursion by manually maintaining a stack: public IEnumerable<FileInfo> Matches()
{
foreach (string rootPath in m_paths)
{
string path = rootPath.Trim();
// we "recurse" into a new directory by jumping to this spot
top:
// ...
Win32.SafeFindHandle handle = // ...
m_scopes.Push(new SearchInfo(handle, path));
bool restart = false;
// we "return" from a sub-directory by jumping to this spot
restart:
if (!handle.IsInvalid)
{
do
{
// if we restarted the loop
// (unwound a recursion), fetch the next match
if (restart)
{
restart = false;
continue;
}
// ...
if ((findData.fileAttributes &
(int)FileAttributes.Directory) != 0)
{
if (m_includeSubDirs)
{
// it's a directory - recurse into it
path = Path.Combine(path, findData.fileName);
goto top;
}
}
else
{
// ...
}
} while (Win32.SafeNativeMethods.FindNextFile(
handle.DangerousGetHandle(), findData));
// ...
// unwind the stack - are we still in a recursion?
m_scopes.Pop();
if (m_scopes.Count > 0)
{
SearchInfo si = m_scopes.Peek();
handle = si.Handle;
path = si.Path;
restart = true;
goto restart;
}
}
}
}
Yes, you read right, I used Some readers have commented that it's possible to eliminate the simulated recursion (and hence the Cleaning up afterwardsSince public void Dispose()
{
while (m_scopes.Count > 0)
{
SearchInfo si = m_scopes.Pop();
si.Handle.Close();
}
}
Note that Limitations and caveatsThe current implementation comes with a couple caveats:
Thanks toRichard Deeming for his suggestions on making better use of History
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||