Click here to Skip to main content
15,879,613 members
Articles / Programming Languages / C++
Article

CEnum - File Enumeration and File Globbing Class

Rate me:
Please Sign up or sign in to vote.
4.45/5 (5 votes)
6 Dec 2008CPOL12 min read 52.3K   1.2K   48   11
CEnum is used for enumeration of files and directories using wildcard matching (globbing)
Image 1

Contents

Introduction

CEnum is a fast and simple class that lets you enumerate (i.e. make a list of) all files in a directory.
It supports:

  • the use of wild card characters * and ? (i.e. it's a file globbing class too)
  • separate enumeration of files and subdirectories
  • separate include list, exclude list and ignore case option for both files and directories
  • recursive search (i.e. it can enumerate content of subdirectories too)
  • enumeration using either file's full path or just file name
  • written in STL, but it optionally supports MFC collection classes
  • UNICODE aware (in Microsoft world UNICODE stands for UTF-16 little-endian)
  • works as a wrapper around ::FindFirstFile and ::FindNextFile

How Does It Work?

CEnum is simplicity itself:

C++
// enumerate all files on C: drive

CEnum enumerator;
enumerator.bRecursive = true;          // search subdirectories too
enumerator.bFullPath  = true;          // enumerate using file's full path

// start search
enumerator.EnumerateAll(_T("C:\\"));   // root directory to start your search from

list<string> * pAllFilesOnCDrive = enumerator.GetFiles();

// congrats, you have just created a list of all files on C: drive
// lets print it

list<string>::iterator iter = pAllFilesOnCDrive->begin();
for (; iter != pAllFilesOnCDrive->end(); ++iter)
{
    cout << iter->c_str();
}

// at this point just move on,
// CEnum will release memory allocated by pAllFilesOnCDrive for you

What if you hate STL with a passion, and want to use MFC containers?
No worries, CEnum is your friend too:

  • At the top of enum.cpp uncomment this line: //#define MFC
  • Recompile
C++
// enumerate all MP3's on C: drive

CEnum enumerator;
enumerator.sIncPatternFiles  = _T("*.mp3");
enumerator.bNoCaseFiles      = true;       // enumerate both *.mp3 and *.MP3
enumerator.bRecursive        = true;
enumerator.bFullPath         = true;


// start search
enumerator.EnumerateAll(_T("C:"));         // unlike in the first example, this time
                                           // there is no backslash at the end
                                           // of path string

CStringArray * pCStringArray = enumerator.GetFilesAsCStringArray();
for(int i=0; i<pCStringArray->GetSize(); ++i)
{
    cout << (LPCTSTR) pCStringArray->GetAt[i];
}


// or if you prefer CStringList...
CStringList  * pCStringList  = enumerator.GetFilesAsCStringList();
POSITION pos = pCStringList->GetHeadPosition();
while (pos != NULL)
{
    cout << (LPCTSTR) pCStringList->GetNext(pos);
}

// once again, memory management is taken care of by CEnum.
// at this point you can just forget that pCStringArray and pCStringList even existed

Note that, for your convenience, the path to root folder may or may not end with backslash!

This is already enough knowledge for your basic use of CEnum, for details read on.

Background

A few things inspired me to write this class:

  • The fact that I work with files all the time, and I always copy/paste the same code over and over again (and somehow I always have a bug in it :-))
  • The fact that I couldn't find a similar class just by using Google (don't laugh, I hate browsing through huge indexes of popular source code pages)
  • The work of Jack Handy [^] and Alessandro Felice Cantatore [^]
  • C# Directory class and its method GetFiles()

Class Design

CEnum was designed with the following in mind:

  • must work out of the box (i.e. just make an object, call one method and you have the list of all files in that directory)
  • must be designed using STL, not MFC
  • must support globbing (wild card search)
  • must support UNICODE
  • must be able to ignore case when comparing strings
  • must take care of memory management, so that user's calling function doesn't have to allocate or free any STL objects on its own

Here is the class:

User can set a number of options through public member variables (for detailed description of each option, see section Class members).

C++
// public member variables (user options)

typedef basic_string<TCHAR> _stl_string; // basic_string<TCHAR> will compile
                                         // into std::string or std::wstring,
                                         // based on if _UNICODE was defined

class CEnum
{
public:
    _stl_string sExcPatternDirs;     // in case of conflict Exclude pattern has
                                     // precedence over Include pattern
    _stl_string sExcPatternFiles;
    _stl_string sIncPatternDirs;
    _stl_string sIncPatternFiles;

    bool bRecursive;
    bool bFullPath;
    bool bNoCaseDirs;
    bool bNoCaseFiles;
};

Default constructor will initialize variables with most common values:

C++
// default constructor

public:
    CEnum()
    {
        plstDirs            = new list<_stl_string >;   // notice the space in front
                                                        // of the right angle bracket !!!
        plstFiles           = new list<_stl_string >;

        sExcPatternDirs     = _T("");
        sExcPatternFiles    = _T("");
        sIncPatternDirs     = _T("");
        sIncPatternFiles    = _T("");

        bRecursive          = false;
        bFullPath           = false;
        bNoCaseDirs         = false;
        bNoCaseFiles        = false;
    }

If you wish, you can set user options directly in overloaded constructor:

C++
// parameterized constructor
    CEnum
    (
        _stl_string sPath,
        _stl_string sExcludePatternDirs     = _T(""),
        _stl_string sExcludePatternFiles    = _T(""),
        _stl_string sIncludePatternDirs     = _T(""),
        _stl_string sIncludePatternFiles    = _T(""),
        bool bRecursiveSearch               = false,
        bool bUseFullPath                   = false,
        bool bIgnoreCaseDirs                = false,
        bool bIgnoreCaseFiles               = false
    )
    {
        plstDirs          = new list<_stl_string >;
        plstFiles         = new list<_stl_string >;

        sExcPatternDirs   = sExcludePatternDirs;
        sExcPatternFiles  = sExcludePatternFiles;
        sIncPatternDirs   = sIncludePatternDirs;
        sIncPatternFiles  = sIncludePatternFiles;

        bRecursive        = bRecursiveSearch;
        bFullPath         = bUseFullPath;
        bNoCaseDirs       = bIgnoreCaseDirs;
        bNoCaseFiles      = bIgnoreCaseFiles;

        EnumerateAll(sPath);
    }

Variety of lists CEnum can give you:
Note: By default, MFC collection classes are excluded from the build.

C++
// internal lists

    list<_stl_string > * GetDirs();
    list<_stl_string > * GetFiles();

//#define MFC

#ifdef MFC

    CStringArray * GetDirsAsCStringArray();
    CStringArray * GetFilesAsCStringArray();
    CStringList  * GetDirsAsCStringList();
    CStringList  * GetFilesAsCStringList();

#endif

Instantiating

There are two constructors for greater flexibility when using CEnum:

C++
// ... using default constructor

CEnum enumerator;

// set options
enumerator.sExcPatternDirs  = _T("Post Black album Metallica");
enumerator.sIncPatternDirs  = _T("Iron Maiden");
enumerator.sExcPatternFiles = _T("*.wma");
enumerator.sIncPatternFiles = _T("*.mp3;*.ogg");
enumerator.bRecursive       = true;
enumerator.bFullPath        = true;
enumerator.bNoCaseDirs      = true;
enumerator.bNoCaseFiles     = true;

enumerator.EnumerateAll(_T("D:\\Music"));

list<string> * pQualityMetal = enumerator.GetFiles();

... or the same thing with fewer lines:

C++
// ... using parameterized constructor

CEnum enumerator(
                    _T("D:\\Music"),
                    _T("Post Black album Metallica"),
                    _T("*.wma"),
                    _T("Iron Maiden"),
                    _T("*.mp3;*.ogg"),
                    true,
                    true,
                    true,
                    true
                 );

list<string> * pQualityMetal = enumerator.GetFiles();

... or the really simple way (if default values work for you):

C++
// ... using parameterized constructor with default values

CEnum enumerator( _T("D:\\Music") );

list<string> * pQualityMetal = enumerator.GetFiles();

Class Destructor

Let me quote the very first book I read about C++, Jesse Liberty's "Teach yourself C++ in 21 days": "If you are writing a function that needs to create memory and then pass it back to the calling function, consider changing your interface. Have the calling function allocate the memory and then pass it into your function by reference. This moves all memory management out of your program and back to the function that is prepared to delete it."

Why did I not follow this great advice? Well firstly, this was designed to be similar to C# class Directory (and in C# garbage collector takes care of memory management, thus the caller does not care about memory issues). And secondly, I wanted a class that would be easy to use. Remember, CEnum's job is simply to create a list of file names.

So, here is how things work in CEnum:

  1. Constructor allocates two lists
  2. After enumeration process ends, pointers to two lists are returned to calling function
  3. Pointer(s) to list(s) are deleted in destructor of CEnum

This means that your enumeration list(s) will live only as long as the life time of CEnum object that created it !!! You cannot pass the pointer to some other function. If you would like to do that, you either need to create a copy of the list, or comment out some (or all) delete statements in destructor.

Is this a bad design? It just might be, but simplicity of use was my primary issue when I designed this class. I wanted to be able to enumerate directory in a single line (see example above), do something with those files (e.g. display them on the screen) and then to forget about CEnum object and all the lists it allocated. If you want to do it 'properly' CEnum can be easily adapted to use lists allocated by client application, all that is needed is to add one more constructor and to comment out few lines in destructor.

Class Members

This is the list of CEnum's public member variables through which user can set desired options:

  • bRecursive
    Description: if true, subdirectories will be enumerated too
    Default value: false
  • bFullPath
    Description: if true, files will be enumerated using file's full path, otherwise list will contain file names only
    Default value: false
  • bNoCaseDirs
    Description: if true, case will be ignored when searching directory (and only directory) names
    Default value: false
  • bNoCaseFiles
    Description: if true, case will be ignored when searching file (and only file) names
    Default value: false
  • sIncPatternFiles
    Description: matching pattern for files you wish to include in your search. Wild cards * and ? are supported. If you have more than one search pattern, separate them with semicolon.
    Default value: empty string
    Examples:
    • "*.mp3;*.mp4"
    • "*.mp?" (same as first example)
    • "*.mp3;iron maid*;latest*"
    Note that in case of Include patterns, empty string means "enumerate all", i.e. everything is included !!

  • sExcPatternFiles
    Description: matching pattern for files you wish to exclude from your search. Wild cards * and ? are supported. If you have more that one search pattern, separate them with semicolon.
    Default value: empty string
    Examples:
    • "*.mp3;*.mp4"
    • "*.mp?" (same as first example)
    • "*.mp3;iron maid*;latest*"
    Note that in case of Exclude patterns, empty string means "enumerate none", i.e. nothing is excluded!!
    Also, in case of conflict, Exclude pattern has precedence over Include pattern.
  • sIncPatternDirs same as sIncPatternFiles above, just for directories.
  • sExcPatternDirs same as ExcPatternFiles above, just for directories.

STL List vs. MFC Collection Classes

CEnum can give you a variety of lists:

C++
// ... show all containers CEnum can return

CEnum enumerator(_T("C:\\"));

list<string> * pSTLlistFiles   = enumerator.GetFiles();
list<string> * pSTLlistDirs    = enumerator.GetDirs();

StringList   * pMFClistFiles   = enumerator.GetFilesAsCStringList();
StringList   * pMFClistDirs    = enumerator.GetDirsAsCStringList();

StringArray  * pMFCArrayFiles  = enumerator.GetFilesAsCStringArray();
StringArray  * pMFCArrayDirs   = enumerator.GetDirsAsCStringArray();

Since CEnum was written using STL, the only two lists created during the execution are two lists returned by GetDirs() and GetFiles() functions. All four MFC containers (two CStringArrays and two CStringLists) are created only when you call conversion functions (GetFilesAs... and GetDirsAs...). In fact, all MFC related stuff is hidden behind preprocessor directives and is by default inactive (i.e. it does not compile). If you need this functionality, then just uncomment //#define MFC line and recompile.

WildCard Compare Functionality

This is another thing that is needed often but not found easily. Two good examples are work of Jack Handy here on CodeProject [^] and the work of Alessandro Felice Cantatone [^]. Both are great examples, but each has its shortcomings. Jack's function is simple and fast, but it doesn't let you ignore case as STL's tolower and UNICODE don't match well, and Alessandro's function was designed for IBM OS/2 (not to mention his holier than thou attitude, and don't even get me started on some of the restrictions he made. What has AI got to do with string comparing?)

Anyway, here is what I came up with:

C++
// ... wildcard compare function

_tsetlocale(LC_ALL, _T("");      // execute this before calling _tcsicmp

bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName, bool bNoCase)
{
    TCHAR temp1[2] = _T("");
    TCHAR temp2[2] = _T("");
    LPCTSTR pStar  = 0;
    LPCTSTR pName  = 0;

    while(*sFileName)
    {
        switch (*sPattern)
        {
        case '?':
            ++sFileName; ++sPattern;
            continue;
        case '*':
            if (!*++sPattern) return 1;
            pStar = sPattern;
            pName = sFileName + 1;
            continue;
        default:
            if(bNoCase)
            {
                // _tcsicmp works with strings not chars !!
                *temp1 = *sFileName;
                *temp2 = *sPattern;
                if (!_tcsicmp(temp1, temp2))    // if equal
                {
                    ++sFileName;
                    ++sPattern;
                    continue;
                }
            }
            else if (*sFileName == *sPattern)   // bNoCase == false,
            {                                   // compare chars directly
                ++sFileName;
                ++sPattern;
                continue;
            }

            // chars are NOT equal,
            // if there was no '*' thus far, strings don't match
            if(!pStar) return 0;

            // go back to last '*' character
            sPattern = pStar;
            sFileName = pName++;
            continue;
        }
    }
    // check is there anything left after last '*'
    while (*sPattern == '*') ++sPattern;
    return (!*sPattern);
}

This should be easy to follow:

  • If ? is found in pattern string, chars match, and function moves to the next char in both pattern string and search string.
  • If * is found in pattern string, function moves to the next char in pattern string only. Exits if there are no more chars in pattern string, or saves the record of current position (one char after '*') and the record of next char in search string.
  • When two chars need to be compared regardless of case, run-time library _tcsicmp is used:
    • if strings (temp strings that contain just one character) are equal, function moves to the next char in both pattern string and search string.
    • if strings differ, function will return false if there was no '*' character in pattern string up to this point. Otherwise it will go back to position of last '*' character and advance by one char in search string.

Regarding the use of _tcsicmp:
There was no special reason why I have chosen to use this function, other than the fact that it was the only run-time library routine that passed all my tests for UNICODE string comparing regardless of case.

Note: My tests were limited to European-character sets (stuff like accented, Central European and Nordic characters). For anything beyond that, you will have to test for yourself.

If you would like to use Wildcard compare function in some other project, and you don't need to ignore case, then you can greatly speed things up by using something like this:

C++
// ... wildcard compare function without Ignore case functionality

    bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName)
    {
        LPCTSTR pStar  = 0;
        LPCTSTR pName  = 0;

        while(*sFileName)
        {
            switch (*sPattern)
            {
            case '?':
                ++sFileName; ++sPattern;
                continue;
            case '*':
                if (!*++sPattern) return 1;
                pStar = sPattern;
                vpName = sFileName + 1;
                continue;
            default:
                if (*sFileName == *sPattern) { ++sFileName; ++sPattern; continue; }

                // chars are NOT equal,
                // if there was no '*' thus far, strings don't match
                if(!pStar) return 0;

                // go back to last '*' character
                sPattern = pStar;
                sFileName = pName++;
                continue;
            }
        }
        // check is there anything left after last '*'
        while (*sPattern == '*') ++sPattern;
        return (!*sPattern);
    }

This version is 3-4 times faster because it compares chars directly.

Similar Projects

If you find that CEnum lacks some useful feature, check these projects out. They may be closer to what you need.

Enumeration and globbing:

Wildcard match:

Demo Project

  • CEnum used in demo project has one minor difference. Because I added Wildcard testing functionality in the demo project, CompareStrings function is declared as public and static. Otherwise it is a private non-static method of CEnum.
  • Testing enumeration
    This part is very simple. In OnEnumeration() function in a mere 20 lines of code directory is enumerated and its content is added to CListCtrl.
  • Testing wildcard comparing functionality
    Demo uses test.txt test file in which you can add your own test cases. Format of test.txt is very simple. First string is wildcard string, second string is search string and the last string specifies if first two strings match. Comment lines start with '#' character.

    CEnum2.png

Limitations

CEnum is file-centric, meaning it was designed to search for files moreso than to search for directories.
For example:

  • If you are searching only for files or only for directories, you will enumerate either just the way you intended.
  • But, if you apply filters (exclude or include) for both files and directories, then you will enumerate only those files that reside in directories that match the filter applied for directories.

Depending on how you look at things, the latter case might be seen as a limitation, because you can't perform independent search for files and directories. You can't get a list of all files in all subdirectories, and at the same time get a list of directories that match certain search criteria. In this case, the only thing you can do is to run CEnum twice, once for files and once for directories.

Another thing that user needs to be aware of is the sorting functionality. After enumerating each directory, CEnum calls STL list's sort() method. This method sorts files in alphabetic order, which may be different sorting order than in your file browser (e.g. Windows Explorer).

Finally, be aware that CEnum was designed for globbing (wildcard search) only. It can't search for files based on size, date, file content or file attributes.

Errata

Possible compile error in Visual studio is:

  • fatal error C1010: unexpected end of file while looking for precompiled header. Did you forget to add '#include "stdafx.h"' to your source?

Solution is not to use precompiled headers:

  • In the Solution Explorer pane of the project, right-click the project name, and then click Properties.
  • In the left pane, click the C/C++ folder.
  • Click the Precompiled Headers node.
  • In the right pane, click Create/Use Precompiled Header, and then click Not Using Precompiled Headers.

Revision History

  • Version 1.0 published 05/2008 - Initial public release
  • Version 1.1 published 11/2008

List of changes in version 1.1

Article:

  • Added Destructor chapter
  • Updated Wildcard search algorithm chapter
  • Added Doxygen documentation
  • Changed demo application so files could be opened by double-clicking

Source code:

  • Removed most of dynamic allocations od STL objects
  • Added Doxygen Qt style comments to CEnum.cpp
  • Changed Wildcard compare algorithm in CompareStrings function
  • Rewrote function Tokenize
  • Updated class' Destructor
  • Verified that class compiles cleanly (no error or warnings) at warning level 4 in Visual Studio 2005

Author's License

There is no limitation on use of this class. You can use it (as whole or just parts of it) with or without author's permission in any sort of project regardless of license issues, in both commercial and open source projects. Though, a thank you e-mail would be nice. :-)

Conclusion

That's it folks, a simple class that lets you enumerate files without understanding how ::FindFirstFile and ::FindNextFile API works, or even how CEnum internally works. I hope CEnum will speed things up for you when working with files as much as it did for me.

I'd love to get some feedback from you, be it constructive criticism, request for additional features or bug reports. Don't hesitate to post your comments or send them in e-mail. I am especially interested if you know of a similar class or project. In case you do, just let me know where I can find it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Croatia Croatia
Nothing to brag about.
Just another engineer who works as a software developer last couple of years.

Comments and Discussions

 
QuestionCompareStrings using *.* skips files with no extension Pin
thomheinz18-Jun-10 17:19
thomheinz18-Jun-10 17:19 
AnswerRe: CompareStrings using *.* skips files with no extension Pin
Loreia19-Jun-10 10:51
Loreia19-Jun-10 10:51 
GeneralVersion 1.1 Pin
Loreia3-Dec-08 2:19
Loreia3-Dec-08 2:19 
GeneralRe: Version 1.1 Pin
Loreia6-Dec-08 21:56
Loreia6-Dec-08 21:56 
GeneralMemory leak Pin
fritzpas9-Oct-08 1:06
fritzpas9-Oct-08 1:06 
GeneralRe: Memory leak Pin
Loreia11-Oct-08 8:31
Loreia11-Oct-08 8:31 
GeneralVery interesting but CompareStrings fails Pin
Member 42896133-Jun-08 1:09
Member 42896133-Jun-08 1:09 
With current implementation, a pattern "a*bc" does not match a string "aabdbc". The CompareStrings function finds first 'b' in "aabdbc" then compares next symbols - 'c' in the pattern and 'd' in the string - and returns false. Though it MUST roll back to the last asterisk in the pattern and then try to find "bc" in remained part of the string - "dbc".
GeneralRe: Very interesting but CompareStrings fails Pin
Loreia8-Jun-08 22:03
Loreia8-Jun-08 22:03 
GeneralRe: Very interesting but CompareStrings fails Pin
Loreia8-Jun-08 22:04
Loreia8-Jun-08 22:04 
GeneralRe: Very interesting but CompareStrings fails Pin
Member 42896138-Jun-08 22:51
Member 42896138-Jun-08 22:51 
GeneralRe: Very interesting but CompareStrings fails Pin
Loreia9-Jun-08 0:32
Loreia9-Jun-08 0:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.