Introduction
Using the foreach
loop is easy. It is so straightforward and natural that I rarely use any other loop construct. In fact, when it's time to use the for
loop, I have to concentrate and recall the syntax.
I have to deal with a lot of tree traversals. The obvious choice is to have a foreach
loop to go over all elements of a root, making a recursive call for each node that has children. This works out great until one notices that there are about twenty functions that have a foreach
loop at their core together with a bunch of statements that operate on the same data type.
Using flags in order to exploit common pieces of code did not work out well. The functions simply became unreadable and much more error-prone. Something radical had to be done about this problem.
This article does not contain directly reusable code. Rather, it presents an idea on how to separate an algorithm that iterates over a custom collection from actions done on each item of that collection. The example code works with DirectoryInfo
and FileInfo
objects, but at no point does it modify files. The file system is used simply because it is a tree structure that is available on every computer. For simplicity reasons, there is no error handling and no security checks.
Separating algorithm from data
Staring at a bunch of functions that differ ever so slightly did have a good side effect after all. There are two things to note: first, the algorithm is always a foreach
loop with a recursive call. Second, we are working with objects of the same data type, or a very limited number of data types.
The iteration algorithm does not care what happens to each object that it visits. Iteration's responsibility is simply to visit each and every item of a collection. Then who will do the useful work on visited items? A callback function, of course! We know the data type we are working with, that's the one in the foreach
statement.
Let's use the file system as an example of a tree structure that we want to iterate over. For example, a function that displays all file system objects under a certain root
directory could look as follows:
private void DisplayAllElements(DirectoryInfo Root)
{
foreach (DirectoryInfo dirInfo in Root.GetDirectories())
{
DisplayAllElements(dirInfo);
System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}",
dirInfo.FullName));
}
foreach (FileInfo fileInfo in Root.GetFiles())
{
System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}",
fileInfo.FullName));
}
}
Let's modify it such that iteration code is completely separate from the output code. We are using file system as an example and will be dealing with DirectoryInfo
and FileInfo
instances. Declare callback types that take DirectoryInfo
and FileInfo
objects as parameters:
private delegate void FileInfoHandler(FileInfo File);
private delegate void DirectoryInfoHandler(DirectoryInfo Directory);
Write the functions that perform actual work on items visited by iteration:
private static void ShowDirectoryDetails(DirectoryInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}",
Info.FullName));
}
private static void ShowFileDetails(FileInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}",
Info.FullName));
}
Instantiate delegates. That is, create variables of delegate types that point to the functions above:
private FileInfoHandler ProcessFile = new FileInfoHandler(ShowFileDetails);
private DirectoryInfoHandler ProcessDirectory = new DirectoryInfoHandler
(ShowDirectoryDetails);
Finally, change the iteration algorithm to use the instantiated delegates instead of function calls:
private void DisplayAllElements(DirectoryInfo Root)
{
foreach (DirectoryInfo dirInfo in Root.GetDirectories())
{
DisplayAllElements(dirInfo);
ProcessDirectory(dirInfo);
}
foreach (FileInfo fileInfo in Root.GetFiles())
{
ProcessFile(fileInfo);
}
}
Now the iteration code is completely separate from the output code.
Improving design of iteration function
Though we are not displaying directory and file information inside the foreach
loop anymore, there is no real difference between the new and the original DisplayAllElements
functions. Calling a function callback was a step in the right direction. Let's improve it even more.
In my opinion, the iteration algorithm should be stateless. To maintain state it would need some knowledge about contents of nodes it traverses. This complicates the algorithm and makes it more difficult to verify its correctness. My approach is to keep it very simple. A static
function will do.
public static class Iterators
{
public delegate void FileInfoHandler(FileInfo File);
public delegate void DirectoryInfoHandler(DirectoryInfo Directory);
public static void IterateFileSystemInfo(
DirectoryInfo Root,
DirectoryInfoHandler DirectoryProcessor,
FileInfoHandler FileProcessor)
{
foreach (DirectoryInfo dirInfo in Root.GetDirectories())
{
IterateFileSystemInfo(dirInfo, DirectoryProcessor, FileProcessor);
if (DirectoryProcessor != null)
DirectoryProcessor(dirInfo);
}
foreach (FileInfo fileInfo in Root.GetFiles())
{
if (FileProcessor != null)
FileProcessor(fileInfo);
}
}
}
IterateFileSystemInfo
is almost identical to the modified DisplayAllElements
. The only difference, except for the callbacks' names, is the fact that callbacks are provided as parameters and checked for null
before being called. This allows certain flexibility. For example, we may not wish to display directory names. In this case we can pass a null
parameter for DirectoryInfoHandler
. All subdirectories and files will still be visited, though directory entries will not be "processed".
Static
Iterators
class could have several functions that traverse a collection in a certain way. The algorithm above is depth-first. One could include functions that do a breadth-first search.
Wrapping callback functions
The real work on collection members is done by the callbacks, which are provided to the iteration algorithm. We could simply have several instantiated delegates and call IterateFileSystemInfo
using those delegates. However, design can be improved a lot if function callbacks are wrapped into their separate classes. It will let all object oriented design techniques to be applied to classes containing the callbacks. For example, Decorator pattern could be chosen for the processor class design. Passing parameters to callbacks can be done the same way as in "Passing parameters to predicates".
The goal is to make a wrapper class for the following functions:
private static void ShowDirectoryDetails(DirectoryInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}",
Info.FullName));
}
private static void ShowFileDetails(FileInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}",
Info.FullName));
}
Here is an example of such a class:
public class DisplayAllProcessor
{
#region Delegate properties
public Iterators.DirectoryInfoHandler DisplayDirInfo
{
get { return MyDisplayDirInfo; }
}
public Iterators.FileInfoHandler DisplayFileInfo
{
get { return MyDisplayFileInfo; }
}
#endregion Delegate properties
#region Functions returned by delegate properties
private void MyDisplayDirInfo(DirectoryInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format
("Dir\t{0}", Info.FullName));
}
private void MyDisplayFileInfo(FileInfo Info)
{
System.Diagnostics.Debug.WriteLine(string.Format
("File\t{0}", Info.FullName));
}
#endregion Functions returned by delegate properties
}
How can this new class be used with the IterateFileSystemInfo
tree traversal algorithm? We need a new instance of the DisplayAllProcessor
class. Delegate properties of that instance are used as input parameters for the tree traversal algorithm. As the tree traversal algorithm executes, it will fire MyDisplayDirInfo
and MyDisplayFileInfo
callbacks of DisplayAllProcessor
instance. The way to use it is as follows:
private void DisplayAllElements(DirectoryInfo Root)
{
DisplayAllProcessor dap = new DisplayAllProcessor();
Iterators.IterateFileSystemInfo(
Root,
dap.DisplayDirInfo,
dap.DisplayFileInfo);
}
Inheritance of wrapper classes
My testing program for this article is a WinForms application. It displays output of all callbacks in two columns of a ListView
. One can immediately demonstrate benefits of wrapping callbacks in a class: storage of results that UI needs is built into a base class:
public struct DisplayItem
{
public string Info;
public string Message;
}
public class ProcessorBase
{
protected List<DisplayItem> m_DisplayItems = new List<DisplayItem>();
public virtual List<DisplayItem> DisplayItems
{
get { return m_DisplayItems; }
}
}
Instead of Debug.Writeline()
calls, all processors will add display items to the list in the base class. When iteration runs to completion, property DisplayItems
will contain all the data that UI needs. The new DisplayAllElements
looks as follows, where ShowOutput
simply fills the output ListView
.
private void DisplayAllElements(DirectoryInfo Root)
{
DisplayAllProcessor dap = new DisplayAllProcessor();
Iterators.IterateFileSystemInfo(
Root,
dap.DisplayDirInfo,
dap.DisplayFileInfo);
ShowOutput(dap.DisplayItems);
}
The infrastructure has been set up. Let's have a couple of examples how it can be used.
Maintaining state
State will be maintained by "processors", or instances of wrapper classes. Let's find the maximum and the average file sizes. To do so, we need to keep track of three things: the maximum file size seen so far, the combined size of all files seen, and the number of files seen. These will be member variables of our class.
public class FindAvgAndMaxProcessor : ProcessorBase
{
private long m_MaxFileSize = 0;
private long m_TotalFileSizes = 0;
private long m_TotalFiles = 0;
. . . .
. . . .
}
The counters are updated in the callback function.
protected virtual void MyCountFileSizes(FileInfo Info)
{
if (Info.Length > m_MaxFileSize)
{
m_MaxFileSize = Info.Length;
m_MaxFileInfo = Info;
}
++m_TotalFiles;
m_TotalFileSizes += Info.Length;
}
Results are needed once iteration has run to completion. We know that DisplayItems
property is normally accessed when iteration algorithm has finished its job. Therefore, statistics can be calculated in the DisplayItems
property to take some calculations out of the iteration.
public override List<DisplayItem> DisplayItems
{
get
{
m_DisplayItems.Clear();
DisplayItem diAvg;
diAvg.Info = "Avg file size";
long lAverage = m_TotalFiles == 0 ? 0 :
m_TotalFileSizes / m_TotalFiles;
diAvg.Message = string.Format("{0} bytes", lAverage.ToString());
DisplayItem diMax;
diMax.Info = "Max file size";
string strFileName = m_MaxFileInfo == null ? "None" :
m_MaxFileInfo.FullName;
diMax.Message = string.Format("File name: {0}, size:
{1} bytes", strFileName, m_MaxFileSize);
m_DisplayItems.Add(diAvg);
m_DisplayItems.Add(diMax);
return base.DisplayItems;
}
}
The FindAvgAndMaxProcessor
class is used the same way as previous processor classes. The only difference is that instead of a list of visited files and directories it displays two rows of statistics.
Providing parameters for callbacks
Suppose, we want to find the maximum and average sizes of .EXE and .XML files for a given directory. A processor that takes file extension as parameter is in order. Since callback functions are wrapped in a class, it is simply a matter of having a public
property. The new processor class is so tiny that I will include it here in its entirety.
public class FindAvgAndMaxByExtProcessor : FindAvgAndMaxProcessor
{
private string m_Extension;
public FindAvgAndMaxByExtProcessor(string Extension)
{
m_Extension = Extension;
}
public string Extension
{
get { return m_Extension; }
set
{
m_Extension = value;
ClearStats();
}
}
#region Functions returned by delegate properties
protected override void MyCountFileSizes(FileInfo Info)
{
if (0 == string.Compare(Info.Extension, m_Extension,
StringComparison.CurrentCultureIgnoreCase))
{
base.MyCountFileSizes(Info);
}
}
#endregion Functions returned by delegate properties
}
We let the base class keep the count. Child class controls which files to process by comparing their extension to a member variable.
The FindAvgAndMaxByExtProcessor
class allows selecting different sets of items to be processed. If a class allows that, the state data has to be taken care of when a different subset of items is selected. When setting a new file extension to match in the example above, we have to reset all counters to 0
. Otherwise the example would produce combined statistics for both .EXE and .XML files, which is not what was intended.
FindAvgAndMaxByExt
function demonstrates how the FindAvgAndMaxByExtProcessor
class can be used.
Using the code
To run an example from the source code, select a directory, select the test case, and then click the Run Test button.
Conclusion
The collection iteration algorithm was decoupled from actions performed on each collection item. The algorithm function is a very simple static
function, whose correctness is easy to verify. Once that was done, not a single looping instruction or recursive call was needed to execute test cases. All "processor" classes are also very simple. Some are literally a couple of lines long.
The iteration algorithm is stateless. It's simple and it does its job. A good reason to have this function in its own class would be to implement the IEnumerable
interface, for example. I may do this just for the fun of it.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.