Click here to Skip to main content
15,861,168 members
Articles / Programming Languages / C++
Article

CEnum - File Enumeration and File Globbing Class

Rate me:
Please Sign up or sign in to vote.
4.45/5 (5 votes)
6 Dec 2008CPOL12 min read 52.2K   1.2K   48   11
CEnum is used for enumeration of files and directories using wildcard matching (globbing)
Image 1

Contents

Introduction

CEnum is a fast and simple class that lets you enumerate (i.e. make a list of) all files in a directory.
It supports:

  • the use of wild card characters * and ? (i.e. it's a file globbing class too)
  • separate enumeration of files and subdirectories
  • separate include list, exclude list and ignore case option for both files and directories
  • recursive search (i.e. it can enumerate content of subdirectories too)
  • enumeration using either file's full path or just file name
  • written in STL, but it optionally supports MFC collection classes
  • UNICODE aware (in Microsoft world UNICODE stands for UTF-16 little-endian)
  • works as a wrapper around ::FindFirstFile and ::FindNextFile

How Does It Work?

CEnum is simplicity itself:

C++
// enumerate all files on C: drive

CEnum enumerator;
enumerator.bRecursive = true;          // search subdirectories too
enumerator.bFullPath  = true;          // enumerate using file's full path

// start search
enumerator.EnumerateAll(_T("C:\\"));   // root directory to start your search from

list<string> * pAllFilesOnCDrive = enumerator.GetFiles();

// congrats, you have just created a list of all files on C: drive
// lets print it

list<string>::iterator iter = pAllFilesOnCDrive->begin();
for (; iter != pAllFilesOnCDrive->end(); ++iter)
{
    cout << iter->c_str();
}

// at this point just move on,
// CEnum will release memory allocated by pAllFilesOnCDrive for you

What if you hate STL with a passion, and want to use MFC containers?
No worries, CEnum is your friend too:

  • At the top of enum.cpp uncomment this line: //#define MFC
  • Recompile
C++
// enumerate all MP3's on C: drive

CEnum enumerator;
enumerator.sIncPatternFiles  = _T("*.mp3");
enumerator.bNoCaseFiles      = true;       // enumerate both *.mp3 and *.MP3
enumerator.bRecursive        = true;
enumerator.bFullPath         = true;


// start search
enumerator.EnumerateAll(_T("C:"));         // unlike in the first example, this time
                                           // there is no backslash at the end
                                           // of path string

CStringArray * pCStringArray = enumerator.GetFilesAsCStringArray();
for(int i=0; i<pCStringArray->GetSize(); ++i)
{
    cout << (LPCTSTR) pCStringArray->GetAt[i];
}


// or if you prefer CStringList...
CStringList  * pCStringList  = enumerator.GetFilesAsCStringList();
POSITION pos = pCStringList->GetHeadPosition();
while (pos != NULL)
{
    cout << (LPCTSTR) pCStringList->GetNext(pos);
}

// once again, memory management is taken care of by CEnum.
// at this point you can just forget that pCStringArray and pCStringList even existed

Note that, for your convenience, the path to root folder may or may not end with backslash!

This is already enough knowledge for your basic use of CEnum, for details read on.

Background

A few things inspired me to write this class:

  • The fact that I work with files all the time, and I always copy/paste the same code over and over again (and somehow I always have a bug in it :-))
  • The fact that I couldn't find a similar class just by using Google (don't laugh, I hate browsing through huge indexes of popular source code pages)
  • The work of Jack Handy [^] and Alessandro Felice Cantatore [^]
  • C# Directory class and its method GetFiles()

Class Design

CEnum was designed with the following in mind:

  • must work out of the box (i.e. just make an object, call one method and you have the list of all files in that directory)
  • must be designed using STL, not MFC
  • must support globbing (wild card search)
  • must support UNICODE
  • must be able to ignore case when comparing strings
  • must take care of memory management, so that user's calling function doesn't have to allocate or free any STL objects on its own

Here is the class:

User can set a number of options through public member variables (for detailed description of each option, see section Class members).

C++
// public member variables (user options)

typedef basic_string<TCHAR> _stl_string; // basic_string<TCHAR> will compile
                                         // into std::string or std::wstring,
                                         // based on if _UNICODE was defined

class CEnum
{
public:
    _stl_string sExcPatternDirs;     // in case of conflict Exclude pattern has
                                     // precedence over Include pattern
    _stl_string sExcPatternFiles;
    _stl_string sIncPatternDirs;
    _stl_string sIncPatternFiles;

    bool bRecursive;
    bool bFullPath;
    bool bNoCaseDirs;
    bool bNoCaseFiles;
};

Default constructor will initialize variables with most common values:

C++
// default constructor

public:
    CEnum()
    {
        plstDirs            = new list<_stl_string >;   // notice the space in front
                                                        // of the right angle bracket !!!
        plstFiles           = new list<_stl_string >;

        sExcPatternDirs     = _T("");
        sExcPatternFiles    = _T("");
        sIncPatternDirs     = _T("");
        sIncPatternFiles    = _T("");

        bRecursive          = false;
        bFullPath           = false;
        bNoCaseDirs         = false;
        bNoCaseFiles        = false;
    }

If you wish, you can set user options directly in overloaded constructor:

C++
// parameterized constructor
    CEnum
    (
        _stl_string sPath,
        _stl_string sExcludePatternDirs     = _T(""),
        _stl_string sExcludePatternFiles    = _T(""),
        _stl_string sIncludePatternDirs     = _T(""),
        _stl_string sIncludePatternFiles    = _T(""),
        bool bRecursiveSearch               = false,
        bool bUseFullPath                   = false,
        bool bIgnoreCaseDirs                = false,
        bool bIgnoreCaseFiles               = false
    )
    {
        plstDirs          = new list<_stl_string >;
        plstFiles         = new list<_stl_string >;

        sExcPatternDirs   = sExcludePatternDirs;
        sExcPatternFiles  = sExcludePatternFiles;
        sIncPatternDirs   = sIncludePatternDirs;
        sIncPatternFiles  = sIncludePatternFiles;

        bRecursive        = bRecursiveSearch;
        bFullPath         = bUseFullPath;
        bNoCaseDirs       = bIgnoreCaseDirs;
        bNoCaseFiles      = bIgnoreCaseFiles;

        EnumerateAll(sPath);
    }

Variety of lists CEnum can give you:
Note: By default, MFC collection classes are excluded from the build.

C++
// internal lists

    list<_stl_string > * GetDirs();
    list<_stl_string > * GetFiles();

//#define MFC

#ifdef MFC

    CStringArray * GetDirsAsCStringArray();
    CStringArray * GetFilesAsCStringArray();
    CStringList  * GetDirsAsCStringList();
    CStringList  * GetFilesAsCStringList();

#endif

Instantiating

There are two constructors for greater flexibility when using CEnum:

C++
// ... using default constructor

CEnum enumerator;

// set options
enumerator.sExcPatternDirs  = _T("Post Black album Metallica");
enumerator.sIncPatternDirs  = _T("Iron Maiden");
enumerator.sExcPatternFiles = _T("*.wma");
enumerator.sIncPatternFiles = _T("*.mp3;*.ogg");
enumerator.bRecursive       = true;
enumerator.bFullPath        = true;
enumerator.bNoCaseDirs      = true;
enumerator.bNoCaseFiles     = true;

enumerator.EnumerateAll(_T("D:\\Music"));

list<string> * pQualityMetal = enumerator.GetFiles();

... or the same thing with fewer lines:

C++
// ... using parameterized constructor

CEnum enumerator(
                    _T("D:\\Music"),
                    _T("Post Black album Metallica"),
                    _T("*.wma"),
                    _T("Iron Maiden"),
                    _T("*.mp3;*.ogg"),
                    true,
                    true,
                    true,
                    true
                 );

list<string> * pQualityMetal = enumerator.GetFiles();

... or the really simple way (if default values work for you):

C++
// ... using parameterized constructor with default values

CEnum enumerator( _T("D:\\Music") );

list<string> * pQualityMetal = enumerator.GetFiles();

Class Destructor

Let me quote the very first book I read about C++, Jesse Liberty's "Teach yourself C++ in 21 days": "If you are writing a function that needs to create memory and then pass it back to the calling function, consider changing your interface. Have the calling function allocate the memory and then pass it into your function by reference. This moves all memory management out of your program and back to the function that is prepared to delete it."

Why did I not follow this great advice? Well firstly, this was designed to be similar to C# class Directory (and in C# garbage collector takes care of memory management, thus the caller does not care about memory issues). And secondly, I wanted a class that would be easy to use. Remember, CEnum's job is simply to create a list of file names.

So, here is how things work in CEnum:

  1. Constructor allocates two lists
  2. After enumeration process ends, pointers to two lists are returned to calling function
  3. Pointer(s) to list(s) are deleted in destructor of CEnum

This means that your enumeration list(s) will live only as long as the life time of CEnum object that created it !!! You cannot pass the pointer to some other function. If you would like to do that, you either need to create a copy of the list, or comment out some (or all) delete statements in destructor.

Is this a bad design? It just might be, but simplicity of use was my primary issue when I designed this class. I wanted to be able to enumerate directory in a single line (see example above), do something with those files (e.g. display them on the screen) and then to forget about CEnum object and all the lists it allocated. If you want to do it 'properly' CEnum can be easily adapted to use lists allocated by client application, all that is needed is to add one more constructor and to comment out few lines in destructor.

Class Members

This is the list of CEnum's public member variables through which user can set desired options:

  • bRecursive
    Description: if true, subdirectories will be enumerated too
    Default value: false
  • bFullPath
    Description: if true, files will be enumerated using file's full path, otherwise list will contain file names only
    Default value: false
  • bNoCaseDirs
    Description: if true, case will be ignored when searching directory (and only directory) names
    Default value: false
  • bNoCaseFiles
    Description: if true, case will be ignored when searching file (and only file) names
    Default value: false
  • sIncPatternFiles
    Description: matching pattern for files you wish to include in your search. Wild cards * and ? are supported. If you have more than one search pattern, separate them with semicolon.
    Default value: empty string
    Examples:
    • "*.mp3;*.mp4"
    • "*.mp?" (same as first example)
    • "*.mp3;iron maid*;latest*"
    Note that in case of Include patterns, empty string means "enumerate all", i.e. everything is included !!

  • sExcPatternFiles
    Description: matching pattern for files you wish to exclude from your search. Wild cards * and ? are supported. If you have more that one search pattern, separate them with semicolon.
    Default value: empty string
    Examples:
    • "*.mp3;*.mp4"
    • "*.mp?" (same as first example)
    • "*.mp3;iron maid*;latest*"
    Note that in case of Exclude patterns, empty string means "enumerate none", i.e. nothing is excluded!!
    Also, in case of conflict, Exclude pattern has precedence over Include pattern.
  • sIncPatternDirs same as sIncPatternFiles above, just for directories.
  • sExcPatternDirs same as ExcPatternFiles above, just for directories.

STL List vs. MFC Collection Classes

CEnum can give you a variety of lists:

C++
// ... show all containers CEnum can return

CEnum enumerator(_T("C:\\"));

list<string> * pSTLlistFiles   = enumerator.GetFiles();
list<string> * pSTLlistDirs    = enumerator.GetDirs();

StringList   * pMFClistFiles   = enumerator.GetFilesAsCStringList();
StringList   * pMFClistDirs    = enumerator.GetDirsAsCStringList();

StringArray  * pMFCArrayFiles  = enumerator.GetFilesAsCStringArray();
StringArray  * pMFCArrayDirs   = enumerator.GetDirsAsCStringArray();

Since CEnum was written using STL, the only two lists created during the execution are two lists returned by GetDirs() and GetFiles() functions. All four MFC containers (two CStringArrays and two CStringLists) are created only when you call conversion functions (GetFilesAs... and GetDirsAs...). In fact, all MFC related stuff is hidden behind preprocessor directives and is by default inactive (i.e. it does not compile). If you need this functionality, then just uncomment //#define MFC line and recompile.

WildCard Compare Functionality

This is another thing that is needed often but not found easily. Two good examples are work of Jack Handy here on CodeProject [^] and the work of Alessandro Felice Cantatone [^]. Both are great examples, but each has its shortcomings. Jack's function is simple and fast, but it doesn't let you ignore case as STL's tolower and UNICODE don't match well, and Alessandro's function was designed for IBM OS/2 (not to mention his holier than thou attitude, and don't even get me started on some of the restrictions he made. What has AI got to do with string comparing?)

Anyway, here is what I came up with:

C++
// ... wildcard compare function

_tsetlocale(LC_ALL, _T("");      // execute this before calling _tcsicmp

bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName, bool bNoCase)
{
    TCHAR temp1[2] = _T("");
    TCHAR temp2[2] = _T("");
    LPCTSTR pStar  = 0;
    LPCTSTR pName  = 0;

    while(*sFileName)
    {
        switch (*sPattern)
        {
        case '?':
            ++sFileName; ++sPattern;
            continue;
        case '*':
            if (!*++sPattern) return 1;
            pStar = sPattern;
            pName = sFileName + 1;
            continue;
        default:
            if(bNoCase)
            {
                // _tcsicmp works with strings not chars !!
                *temp1 = *sFileName;
                *temp2 = *sPattern;
                if (!_tcsicmp(temp1, temp2))    // if equal
                {
                    ++sFileName;
                    ++sPattern;
                    continue;
                }
            }
            else if (*sFileName == *sPattern)   // bNoCase == false,
            {                                   // compare chars directly
                ++sFileName;
                ++sPattern;
                continue;
            }

            // chars are NOT equal,
            // if there was no '*' thus far, strings don't match
            if(!pStar) return 0;

            // go back to last '*' character
            sPattern = pStar;
            sFileName = pName++;
            continue;
        }
    }
    // check is there anything left after last '*'
    while (*sPattern == '*') ++sPattern;
    return (!*sPattern);
}

This should be easy to follow:

  • If ? is found in pattern string, chars match, and function moves to the next char in both pattern string and search string.
  • If * is found in pattern string, function moves to the next char in pattern string only. Exits if there are no more chars in pattern string, or saves the record of current position (one char after '*') and the record of next char in search string.
  • When two chars need to be compared regardless of case, run-time library _tcsicmp is used:
    • if strings (temp strings that contain just one character) are equal, function moves to the next char in both pattern string and search string.
    • if strings differ, function will return false if there was no '*' character in pattern string up to this point. Otherwise it will go back to position of last '*' character and advance by one char in search string.

Regarding the use of _tcsicmp:
There was no special reason why I have chosen to use this function, other than the fact that it was the only run-time library routine that passed all my tests for UNICODE string comparing regardless of case.

Note: My tests were limited to European-character sets (stuff like accented, Central European and Nordic characters). For anything beyond that, you will have to test for yourself.

If you would like to use Wildcard compare function in some other project, and you don't need to ignore case, then you can greatly speed things up by using something like this:

C++
// ... wildcard compare function without Ignore case functionality

    bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName)
    {
        LPCTSTR pStar  = 0;
        LPCTSTR pName  = 0;

        while(*sFileName)
        {
            switch (*sPattern)
            {
            case '?':
                ++sFileName; ++sPattern;
                continue;
            case '*':
                if (!*++sPattern) return 1;
                pStar = sPattern;
                vpName = sFileName + 1;
                continue;
            default:
                if (*sFileName == *sPattern) { ++sFileName; ++sPattern; continue; }

                // chars are NOT equal,
                // if there was no '*' thus far, strings don't match
                if(!pStar) return 0;

                // go back to last '*' character
                sPattern = pStar;
                sFileName = pName++;
                continue;
            }
        }
        // check is there anything left after last '*'
        while (*sPattern == '*') ++sPattern;
        return (!*sPattern);
    }

This version is 3-4 times faster because it compares chars directly.

Similar Projects

If you find that CEnum lacks some useful feature, check these projects out. They may be closer to what you need.

Enumeration and globbing:

Wildcard match:

Demo Project

  • CEnum used in demo project has one minor difference. Because I added Wildcard testing functionality in the demo project, CompareStrings function is declared as public and static. Otherwise it is a private non-static method of CEnum.
  • Testing enumeration
    This part is very simple. In OnEnumeration() function in a mere 20 lines of code directory is enumerated and its content is added to CListCtrl.
  • Testing wildcard comparing functionality
    Demo uses test.txt test file in which you can add your own test cases. Format of test.txt is very simple. First string is wildcard string, second string is search string and the last string specifies if first two strings match. Comment lines start with '#' character.

    CEnum2.png

Limitations

CEnum is file-centric, meaning it was designed to search for files moreso than to search for directories.
For example:

  • If you are searching only for files or only for directories, you will enumerate either just the way you intended.
  • But, if you apply filters (exclude or include) for both files and directories, then you will enumerate only those files that reside in directories that match the filter applied for directories.

Depending on how you look at things, the latter case might be seen as a limitation, because you can't perform independent search for files and directories. You can't get a list of all files in all subdirectories, and at the same time get a list of directories that match certain search criteria. In this case, the only thing you can do is to run CEnum twice, once for files and once for directories.

Another thing that user needs to be aware of is the sorting functionality. After enumerating each directory, CEnum calls STL list's sort() method. This method sorts files in alphabetic order, which may be different sorting order than in your file browser (e.g. Windows Explorer).

Finally, be aware that CEnum was designed for globbing (wildcard search) only. It can't search for files based on size, date, file content or file attributes.

Errata

Possible compile error in Visual studio is:

  • fatal error C1010: unexpected end of file while looking for precompiled header. Did you forget to add '#include "stdafx.h"' to your source?

Solution is not to use precompiled headers:

  • In the Solution Explorer pane of the project, right-click the project name, and then click Properties.
  • In the left pane, click the C/C++ folder.
  • Click the Precompiled Headers node.
  • In the right pane, click Create/Use Precompiled Header, and then click Not Using Precompiled Headers.

Revision History

  • Version 1.0 published 05/2008 - Initial public release
  • Version 1.1 published 11/2008

List of changes in version 1.1

Article:

  • Added Destructor chapter
  • Updated Wildcard search algorithm chapter
  • Added Doxygen documentation
  • Changed demo application so files could be opened by double-clicking

Source code:

  • Removed most of dynamic allocations od STL objects
  • Added Doxygen Qt style comments to CEnum.cpp
  • Changed Wildcard compare algorithm in CompareStrings function
  • Rewrote function Tokenize
  • Updated class' Destructor
  • Verified that class compiles cleanly (no error or warnings) at warning level 4 in Visual Studio 2005

Author's License

There is no limitation on use of this class. You can use it (as whole or just parts of it) with or without author's permission in any sort of project regardless of license issues, in both commercial and open source projects. Though, a thank you e-mail would be nice. :-)

Conclusion

That's it folks, a simple class that lets you enumerate files without understanding how ::FindFirstFile and ::FindNextFile API works, or even how CEnum internally works. I hope CEnum will speed things up for you when working with files as much as it did for me.

I'd love to get some feedback from you, be it constructive criticism, request for additional features or bug reports. Don't hesitate to post your comments or send them in e-mail. I am especially interested if you know of a similar class or project. In case you do, just let me know where I can find it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Croatia Croatia
Nothing to brag about.
Just another engineer who works as a software developer last couple of years.

Comments and Discussions

 
GeneralVersion 1.1 Pin
Loreia3-Dec-08 2:19
Loreia3-Dec-08 2:19 
Ok, I finally found some time to update the article.
I'll post here answers to Member 4289613's and fritzpas' questions, rather than answering separately. Both of you have been of great help. Thanks for helping me debug this stuff.

-------- here goes the answer

First, let me comment on use of pointers. There really was no need to use pointers to STL lists in Enumerate function, as I said, I don't even notice such things because in a project I currently work on in my company it is not possible to create STL objects without the use of operator new (I just love non standard compilers), and that's why I overused dynamic allocations of STL lists.
The new design is just slightly changed so the only lists created by operator new are the ones returned by GetFiles and GetDirs functions.
This is the design I like, so I choose to keep it. Class allocates the memory and frees it latter on, and there is no need for user intervention.
CEnum can be easily adapted to use lists allocated by client application, all that is needed is to add one more constructor and to comment out few lines in destructor.

I ***might*** include another constructor that would implement use of those lists as arguments, but I'll see about that.

Wildcard compare stuff...
Well, thanks again for your great help in debugging this stuff. I have been in doubt as to which approach is the best, so I decided to use the speed of execution as my primary criteria. I narrowed it down to four different algorithms and designed a small test.
I used:
1. Jack Handy's algorithm
2. Your algorithm
3. The one I used as a quick fix
4. The one I created based on Jack's design

Each algorithm goes through three tests:

1. First test is what I like to think of as "real life" example. Match an mp3 file name. I used a very long name by copying some random sentence from your post Smile | :)
sPatterns.Add(_T("*.mp3"));
sPatterns.Add(_T("I mean, it would be better to use such approach in all functions, and remove all dynamically created std::list objects.mp3"));

2. Second test is Jack's test when someone doubted his algorithm
sPatterns.Add(_T("*t?st?n*this*t*"));
sPatterns.Add(_T("testin this sh*t"));

3. Member 4289613's test
sPatterns.Add(_T("ab*cd*ef*gh"));
sPatterns.Add(_T("ab c1 cd e1 ef g1 gh"));

All tests were done using ASCII strings, and I used CTimeSpan class which has a resolution of 1 second. It is a simple MFC project, and this is the first time I used function pointers (fun stuff, but I hate the syntax Smile | :) )
Also all test are done with release build with default settings, no optimization was turned on. (On VS 2005 speed optimization will skip the test, and for loop will not be executed at all. Incredible!! But I have no time to bother to discover why this is the case.)

Here is the test code (it might look better in some editor like Notepad++):

C++
	CWaitCursor wait;

	int iUpperLimit = 99999999;

	CTime t1 = CTime::GetCurrentTime();
	CTime t2 = CTime::GetCurrentTime();
	CTimeSpan period = t2 - t1;
	CString sOut = _T("");
	CString sTemp = _T("");
	CStringArray sPatterns;
	CUIntArray iResults;

	sPatterns.Add(_T("*.mp3"));
	sPatterns.Add(_T("I mean, it would be better to use such approach in all functions, and remove all dynamically created std::list objects.mp3"));

	sPatterns.Add(_T("*t?st?n*this*t*"));
	sPatterns.Add(_T("testin this sh*t"));

	sPatterns.Add(_T("ab*cd*ef*gh"));
	sPatterns.Add(_T("ab c1 cd e1 ef g1 gh"));

	// it won't compile if I use variable 
	// (and I thought that newer revisions of C++ standard do not require array size to be determined at compile time)
	#define MaxArray 4
	int (CWildDlg::*pFuncArray[MaxArray])(LPCTSTR, LPCTSTR);

	pFuncArray[0] = &CWildDlg::wildcmp;
	pFuncArray[1] = &CWildDlg::match_mask;
	pFuncArray[2] = &CWildDlg::CompareQuickFix;
	pFuncArray[3] = &CWildDlg::CompareFinal;

	for (int i=0; i<maxarray;>	{
		for (int j=0; j<spatterns.getsize();>		{
			t1 = CTime::GetCurrentTime();

			for (int z=0; z<iupperlimit;>			{
				(this->*pFuncArray[i])(sPatterns[j].GetBuffer(), sPatterns[j+1].GetBuffer());
			}

			t2 = CTime::GetCurrentTime();
			period = t2 - t1;

			iResults.Add((int)period.GetTotalSeconds());
		}
	}

	// format output in rows
	sOut = _T("");
	int iMax = sPatterns.GetSize()/2;

	for (int i=0; i<imax;>	{
		for (int j=0; j<maxarray;>		{
			sTemp.Format( _T("%d\t"), iResults[i+j*iMax] );
			sOut += sTemp;

			if(j == (MaxArray - 1)) sOut += _T("\n");
		}
	}

	sPatterns.RemoveAll();

	AfxMessageBox(sOut);

	
Where functions are:

int CWildDlg::wildcmp(LPCTSTR wild, LPCTSTR string) 
{
	// Written by Jack Handy - jakkhandy@hotmail.com

	LPCTSTR cp = NULL, mp = NULL;

	while ((*string) && (*wild != '*')) {
		if ((*wild != *string) && (*wild != '?')) {
			return 0;
		}
		wild++;
		string++;
	}

	while (*string) {
		if (*wild == '*') {
			if (!*++wild) {
				return 1;
			}
			mp = wild;
			cp = string+1;
		} else if ((*wild == *string) || (*wild == '?')) {
			wild++;
			string++;
		} else {
			wild = mp;
			string = cp++;
		}
	}

	while (*wild == '*') {
		wild++;
	}
	return !*wild;
}

int CWildDlg::match_mask(LPCTSTR mask, LPCTSTR str) 
{
	if (mask && str)
	{
		int matched = 0;
		int done = 0;

		while (!done)
		{
			if (*mask == '*') // 0 or more characters
			{
				++mask;
				if (*mask == 0)
				{
					matched = 1;
				}
				else
				{
					matched = match_mask(mask, str);
					while ((!matched) && (*str != 0))
					{
						++str;
						matched = match_mask(mask, str);
					}
				}
				done = 1;
			}
			else if (*mask == 0) // mask is over
			{
				matched = (*str == 0) ? 1 : 0;
				done = 1;
			}
			else
			{ 
				if ( (*mask == *str) ||  // exact match, case-sensitive
					((*mask == '?') && (*str != 0)) ) // any character
				{
					++mask;
					++str;
				}
				else
				{
					matched = 0;
					done = 1;
				}
			}
		}
		return matched;
	}
	return 0;
}

    
int CWildDlg::CompareQuickFix(LPCTSTR sPattern, LPCTSTR sFileName)
{
	bool bStar = false;

	while(*sFileName)
	{
		switch (*sPattern)
		{
		case '?':
			++sFileName; ++sPattern;
			continue;
		case '*':
			if (!*++sPattern) return 1;
			bStar = true;
			continue;
		default:
			// OK, chars are equal, simply move on to the next character
			if (*sFileName == *sPattern) { ++sFileName; ++sPattern; continue; }

			// NO, chars are NOT equal
			if (!bStar) return 0;									// if there was no '*' in Pattern string, strings don't match

			if (*(sPattern-1) == '*') {	++sFileName; continue; }	// speed optimization for searches like *.mp3

			if (!*sPattern)											// is this the last char in Pattern string?
			{
				do --sPattern; while(*sPattern == '?');				// go back to last char that is not '?'
				if(*sPattern == '*') return 1;						// if we reached '*', we are done
			}
			else
			{
				 while(*(sPattern-1) == '?') { --sPattern; --sFileName; }
				 if(*(sPattern-1) == '*') { ++sFileName; continue; }
			}

			while(*(sPattern-1) != '*') --sPattern;					// still got work to do, just go back to last '*'
			continue;
		}
	}
	// check is there anything left after last '*'
	while (*sPattern == '*') ++sPattern;
	return (!*sPattern);
}


int CWildDlg::CompareFinal(LPCTSTR sPattern, LPCTSTR sFileName)
{
	LPCTSTR pStar = 0;
	LPCTSTR pName = 0;

	while(*sFileName)
	{
		switch (*sPattern)
		{
		case '?':
			++sFileName; ++sPattern;
			continue;
		case '*':
			if (!*++sPattern) return 1;
			pStar = sPattern;
			pName = sFileName + 1;
			continue;
		default:

			// OK, chars are equal, simply move on to the next character
			if (*sFileName == *sPattern) { ++sFileName; ++sPattern; continue; }

			// chars are NOT equal, 
			// if there was no '*' thus far, strings don't match
			if(!pStar) return 0;

			// go back to last '*' character
			sPattern = pStar;
			sFileName = pName++;
			continue;
		}
	}
	// check is there anything left after last '*'
	while (*sPattern == '*') ++sPattern;
	return (!*sPattern);
}




When I designed this thing I hoped to get simple and conclusive results. But reality was much more complicated Smile | :) Here are the tables with test results (all times are in seconds):


- AMD Athlon mobile 1.6 GHz, single core, Win XP


--Alg.1Alg.2Alg.3Alg.4
Test.1188310196152
Test.239514644
Test.350795849


- AMD Athlon desktop 2,2 GHz, single core, Win XP


--Alg.1Alg.2Alg.3Alg.4
Test.1151243159123
Test.230383431
Test.340604336


- Intel Core 2 Duo, mobile, 2,2 GHz, dual core, Win XP


--Alg.1Alg.2Alg.3Alg.4
Test.1761579897
Test.214282324
Test.319393032


- Intel Centrino, mobile, 1,6 GHz, single core, Win 2000


--Alg.1Alg.2Alg.3Alg.4
Test.1121279141145
Test.223433334
Test.331634548


- AMD Opteron, 2,4 GHz, dual core, Win XP


--Alg.1Alg.2Alg.3Alg.4
Test.1126192131102
Test.225333030
Test.333503832


As you can see, recursion is clearly an overhead (but not as much as expected it to be).

My initial fix simply does too much checking which slows it down, but only on AMD platforms (it does handle "?" chars well, though).
Best speed results are achieved by algorithms number 1 and 4. But speed is dependent on type of processor. On AMD processors algorithm number 4 is faster in two out of three tests, but on Intel processors algorithm number 1 is faster in all three tests. Differences vary from 20% for test number one, up to 60% in tests 2 and 3.

These results might be different with speed optimizations turned on, but as I said something went wrong there and I have no time to dwell into it. Funny thing is that test number one is 20% faster for algorithm number 1 on Intel platform, and yet the same test in 20% faster for algorithm number 4 on AMD platform. This was the same wild.exe file executed on both platforms. Tests were run multiple times just to be sure.
Generally, this code runs much faster on Intel processors, and Jack's algorithm is faster in most of the tests.

So, my conclusion is: My algorithm (number 4) is slower but comparable to Jack's in terms of speed. In fact if you examine it closely, you will see that both are doing practically the same thing. The only noticeable difference is that Jack is using if-else construct and I am using switch command, which provides a small speed advantage when "?" character is found, because switch version just advances to next char, and if-else will first compare chars before proceeding checking if it is "?".
(I tested this with a whole bunch of "?" in pattern string and algorithm number 4 was indeed faster. In fact this is where algorithm number 3 shines. But I don't consider this to be a very realistic search scenario so I don't put much value in it.)

In the end I choose algorithm number 4 as my final choice. Speed is acceptable, I think it is easier to read and it can give you a bit more speed if you use a lot of "?" in your search.

Ok, now I just need to update the article Smile | :)
GeneralRe: Version 1.1 Pin
Loreia6-Dec-08 21:56
Loreia6-Dec-08 21:56 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.