Reuse of iteration algorithms

Alex Perepletov

Rate me:

4.77/5 (12 votes)

5 Sep 20077 min read

52.9K

183

Examples of decoupling iteration algorithm from actions on collection items

Download source files - 11.5 KB

Introduction

Using the foreach loop is easy. It is so straightforward and natural that I rarely use any other loop construct. In fact, when it's time to use the for loop, I have to concentrate and recall the syntax.

I have to deal with a lot of tree traversals. The obvious choice is to have a foreach loop to go over all elements of a root, making a recursive call for each node that has children. This works out great until one notices that there are about twenty functions that have a foreach loop at their core together with a bunch of statements that operate on the same data type.

Using flags in order to exploit common pieces of code did not work out well. The functions simply became unreadable and much more error-prone. Something radical had to be done about this problem.

This article does not contain directly reusable code. Rather, it presents an idea on how to separate an algorithm that iterates over a custom collection from actions done on each item of that collection. The example code works with DirectoryInfo and FileInfo objects, but at no point does it modify files. The file system is used simply because it is a tree structure that is available on every computer. For simplicity reasons, there is no error handling and no security checks.

Separating algorithm from data

Staring at a bunch of functions that differ ever so slightly did have a good side effect after all. There are two things to note: first, the algorithm is always a foreach loop with a recursive call. Second, we are working with objects of the same data type, or a very limited number of data types.

The iteration algorithm does not care what happens to each object that it visits. Iteration's responsibility is simply to visit each and every item of a collection. Then who will do the useful work on visited items? A callback function, of course! We know the data type we are working with, that's the one in the foreach statement.

Let's use the file system as an example of a tree structure that we want to iterate over. For example, a function that displays all file system objects under a certain root directory could look as follows:

private void DisplayAllElements(DirectoryInfo Root)
{
    foreach (DirectoryInfo dirInfo in Root.GetDirectories())
    {
        DisplayAllElements(dirInfo);
        System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}", 
                            dirInfo.FullName));
    }
    foreach (FileInfo fileInfo in Root.GetFiles())
    {
        System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}", 
                            fileInfo.FullName));
    }
}

Let's modify it such that iteration code is completely separate from the output code. We are using file system as an example and will be dealing with DirectoryInfo and FileInfo instances. Declare callback types that take DirectoryInfo and FileInfo objects as parameters:

private delegate void FileInfoHandler(FileInfo File);
private delegate void DirectoryInfoHandler(DirectoryInfo Directory);

Write the functions that perform actual work on items visited by iteration:

private static void ShowDirectoryDetails(DirectoryInfo Info)
{
    System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}", 
                            Info.FullName));
}
private static void ShowFileDetails(FileInfo Info)
{
    System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}", 
                            Info.FullName));
}

Instantiate delegates. That is, create variables of delegate types that point to the functions above:

private FileInfoHandler ProcessFile = new FileInfoHandler(ShowFileDetails);
private DirectoryInfoHandler ProcessDirectory = new DirectoryInfoHandler
                            (ShowDirectoryDetails);

Finally, change the iteration algorithm to use the instantiated delegates instead of function calls:

private void DisplayAllElements(DirectoryInfo Root)
{
    foreach (DirectoryInfo dirInfo in Root.GetDirectories())
    {
        DisplayAllElements(dirInfo);
        ProcessDirectory(dirInfo);
    }
    foreach (FileInfo fileInfo in Root.GetFiles())
    {
        ProcessFile(fileInfo);
    }
}

Now the iteration code is completely separate from the output code.

Improving design of iteration function

Though we are not displaying directory and file information inside the foreach loop anymore, there is no real difference between the new and the original DisplayAllElements functions. Calling a function callback was a step in the right direction. Let's improve it even more.

In my opinion, the iteration algorithm should be stateless. To maintain state it would need some knowledge about contents of nodes it traverses. This complicates the algorithm and makes it more difficult to verify its correctness. My approach is to keep it very simple. A static function will do.

public static class Iterators
{
    public delegate void FileInfoHandler(FileInfo File);
    public delegate void DirectoryInfoHandler(DirectoryInfo Directory);

    public static void IterateFileSystemInfo(
        DirectoryInfo Root,
        DirectoryInfoHandler DirectoryProcessor,
        FileInfoHandler FileProcessor)
    {
        foreach (DirectoryInfo dirInfo in Root.GetDirectories())
        {
            IterateFileSystemInfo(dirInfo, DirectoryProcessor, FileProcessor);
            if (DirectoryProcessor != null)
                DirectoryProcessor(dirInfo);
        }
        foreach (FileInfo fileInfo in Root.GetFiles())
        {
            if (FileProcessor != null)
                FileProcessor(fileInfo);
        }
    }
}

IterateFileSystemInfo is almost identical to the modified DisplayAllElements. The only difference, except for the callbacks' names, is the fact that callbacks are provided as parameters and checked for null before being called. This allows certain flexibility. For example, we may not wish to display directory names. In this case we can pass a null parameter for DirectoryInfoHandler. All subdirectories and files will still be visited, though directory entries will not be "processed".

Static Iterators class could have several functions that traverse a collection in a certain way. The algorithm above is depth-first. One could include functions that do a breadth-first search.

Wrapping callback functions

The real work on collection members is done by the callbacks, which are provided to the iteration algorithm. We could simply have several instantiated delegates and call IterateFileSystemInfo using those delegates. However, design can be improved a lot if function callbacks are wrapped into their separate classes. It will let all object oriented design techniques to be applied to classes containing the callbacks. For example, Decorator pattern could be chosen for the processor class design. Passing parameters to callbacks can be done the same way as in "Passing parameters to predicates".

The goal is to make a wrapper class for the following functions:

private static void ShowDirectoryDetails(DirectoryInfo Info)
{
    System.Diagnostics.Debug.WriteLine(string.Format("Dir\t{0}", 
                            Info.FullName));
}

private static void ShowFileDetails(FileInfo Info)
{
    System.Diagnostics.Debug.WriteLine(string.Format("File\t{0}", 
                            Info.FullName));
}

Here is an example of such a class:

public class DisplayAllProcessor
{
    #region Delegate properties

    public Iterators.DirectoryInfoHandler DisplayDirInfo
    {
        get { return MyDisplayDirInfo; }
    }

    public Iterators.FileInfoHandler DisplayFileInfo
    {
        get { return MyDisplayFileInfo; }
    }

    #endregion Delegate properties

    #region Functions returned by delegate properties

    private void MyDisplayDirInfo(DirectoryInfo Info)
    {
        System.Diagnostics.Debug.WriteLine(string.Format
                    ("Dir\t{0}", Info.FullName));
    }

    private void MyDisplayFileInfo(FileInfo Info)
    {
        System.Diagnostics.Debug.WriteLine(string.Format
                    ("File\t{0}", Info.FullName));
    }

    #endregion Functions returned by delegate properties
}

How can this new class be used with the IterateFileSystemInfo tree traversal algorithm? We need a new instance of the DisplayAllProcessor class. Delegate properties of that instance are used as input parameters for the tree traversal algorithm. As the tree traversal algorithm executes, it will fire MyDisplayDirInfo and MyDisplayFileInfo callbacks of DisplayAllProcessor instance. The way to use it is as follows:

private void DisplayAllElements(DirectoryInfo Root)
{
    // Create a processor that will display all FileSystemInfo objects.
    DisplayAllProcessor dap = new DisplayAllProcessor();
    // Execute the iteration algorithm using the processor.
    Iterators.IterateFileSystemInfo(
        Root,
        dap.DisplayDirInfo,
        dap.DisplayFileInfo);
}

Inheritance of wrapper classes

My testing program for this article is a WinForms application. It displays output of all callbacks in two columns of a ListView. One can immediately demonstrate benefits of wrapping callbacks in a class: storage of results that UI needs is built into a base class:

public struct DisplayItem
{
    public string Info;
    public string Message;
}

public class ProcessorBase
{
    // Messages from processors to show on UI.
    protected List<DisplayItem> m_DisplayItems = new List<DisplayItem>();

    // Gets the messages to show on UI.
    public virtual List<DisplayItem> DisplayItems
    {
        get { return m_DisplayItems; }
    }
}

Instead of Debug.Writeline() calls, all processors will add display items to the list in the base class. When iteration runs to completion, property DisplayItems will contain all the data that UI needs. The new DisplayAllElements looks as follows, where ShowOutput simply fills the output ListView.

private void DisplayAllElements(DirectoryInfo Root)
{
    // Create a processor that will display all FileSystemInfo objects.
    DisplayAllProcessor dap = new DisplayAllProcessor();
    // Execute the iteration algorithm using the processor.
    Iterators.IterateFileSystemInfo(
        Root,
        dap.DisplayDirInfo,
        dap.DisplayFileInfo);
    // Output is ready.
    ShowOutput(dap.DisplayItems);
}

The infrastructure has been set up. Let's have a couple of examples how it can be used.

Maintaining state

State will be maintained by "processors", or instances of wrapper classes. Let's find the maximum and the average file sizes. To do so, we need to keep track of three things: the maximum file size seen so far, the combined size of all files seen, and the number of files seen. These will be member variables of our class.

public class FindAvgAndMaxProcessor : ProcessorBase
{
    private long m_MaxFileSize = 0;
    private long m_TotalFileSizes = 0;
    private long m_TotalFiles = 0;
    . . . .
    . . . .
}

The counters are updated in the callback function.

protected virtual void MyCountFileSizes(FileInfo Info)
{
    // Keep track of the biggest file.
    if (Info.Length > m_MaxFileSize)
    {
        m_MaxFileSize = Info.Length;
        m_MaxFileInfo = Info;
    }

    // Keep track of file count and the sum of all file sizes.
    ++m_TotalFiles;
    m_TotalFileSizes += Info.Length;
}

Results are needed once iteration has run to completion. We know that DisplayItems property is normally accessed when iteration algorithm has finished its job. Therefore, statistics can be calculated in the DisplayItems property to take some calculations out of the iteration.

public override List<DisplayItem> DisplayItems
{
    get
    {
        m_DisplayItems.Clear();

        DisplayItem diAvg;
        diAvg.Info = "Avg file size";
        long lAverage = m_TotalFiles == 0 ? 0 : 
                    m_TotalFileSizes / m_TotalFiles;
        diAvg.Message = string.Format("{0} bytes", lAverage.ToString());
        DisplayItem diMax;
        diMax.Info = "Max file size";
        string strFileName = m_MaxFileInfo == null ? "None" : 
                        m_MaxFileInfo.FullName;
        diMax.Message = string.Format("File name: {0}, size: 
                {1} bytes", strFileName, m_MaxFileSize);

        m_DisplayItems.Add(diAvg);
        m_DisplayItems.Add(diMax);

        return base.DisplayItems;
    }
}

The FindAvgAndMaxProcessor class is used the same way as previous processor classes. The only difference is that instead of a list of visited files and directories it displays two rows of statistics.

Providing parameters for callbacks

Suppose, we want to find the maximum and average sizes of .EXE and .XML files for a given directory. A processor that takes file extension as parameter is in order. Since callback functions are wrapped in a class, it is simply a matter of having a public property. The new processor class is so tiny that I will include it here in its entirety.

public class FindAvgAndMaxByExtProcessor : FindAvgAndMaxProcessor
{
    private string m_Extension;

    public FindAvgAndMaxByExtProcessor(string Extension)
    {
        m_Extension = Extension;
    }

    public string Extension
    {
        get { return m_Extension; }
        set
        {
            m_Extension = value;
            ClearStats();
        }
    }

    #region Functions returned by delegate properties

    protected override void MyCountFileSizes(FileInfo Info)
    {
        if (0 == string.Compare(Info.Extension, m_Extension, 
                StringComparison.CurrentCultureIgnoreCase))
        {
            base.MyCountFileSizes(Info);
        }
    }

    #endregion Functions returned by delegate properties
}

We let the base class keep the count. Child class controls which files to process by comparing their extension to a member variable.

The FindAvgAndMaxByExtProcessor class allows selecting different sets of items to be processed. If a class allows that, the state data has to be taken care of when a different subset of items is selected. When setting a new file extension to match in the example above, we have to reset all counters to 0. Otherwise the example would produce combined statistics for both .EXE and .XML files, which is not what was intended.

FindAvgAndMaxByExt function demonstrates how the FindAvgAndMaxByExtProcessor class can be used.

Using the code

To run an example from the source code, select a directory, select the test case, and then click the Run Test button.

Conclusion

The collection iteration algorithm was decoupled from actions performed on each collection item. The algorithm function is a very simple static function, whose correctness is easy to verify. Once that was done, not a single looping instruction or recursive call was needed to execute test cases. All "processor" classes are also very simple. Some are literally a couple of lines long.

The iteration algorithm is stateless. It's simple and it does its job. A good reason to have this function in its own class would be to implement the IEnumerable interface, for example. I may do this just for the fun of it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Alex Perepletov

Software Developer (Senior)

Canada

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.