Click here to Skip to main content
Click here to Skip to main content

String Tokenizer Class (CTokenEx)

By , 26 Jan 2000
 

Introduction

Basically, I've seen other string tokenizers and they lacked the functionality I was looking for. Therefore, I created one for myself using the KISS (Keep-It-Simple-Stupid) method. This is a VERY SIMPLE sample!!!!

Here is a summary of the functionality in the CTokenEx class, you can:

  • use SplitPath to break-up the path into sections (Drive/Share name, Directory, Filename, Extension). Also, recognizes UNC names (which _tsplitpath doesn't).
  • use Join to create a CString from a CStringArray with delimiters of your choice.
  • use Split to break-up a CString into a CStringArray (according to the delimiter).
  • use GetString to get the first sub-string in a CString (according to the delimiter).

NOTE:

The Split and GetString functions recognize multiple delimiters as an empty string so that it will NOT add blanks to an array (unless you want it to). See example code below:

Say you have a CString that contains: "abc,def,,,ghi,,jkl,,"

//********************************************************
// Split Function
//********************************************************
//
// Split will fill an array with:
//
// NOTE:  IF PARAM #4 IS TRUE, YOU'LL SEE LIST #1 ELSE LIST #2
//
// LIST #1:
//  
// String  Position
// ======  ========
// abc     0
// def     1
//         2
//         3
// ghi     4
//         5
// jkl     6
//         7
//         8
//
//
// LIST #2 (Same String):
//  
// String  Position
// ======  ========
// abc     0
// def     1
// ghi     2
// jkl     3
//
//********************************************************
void <SOME NAME>Dlg::OnSplit() 
{
    CTokenEx tok;

    // CString for the Split Function
    CString csSplit = "abc,def,,,ghi,,jkl,,";

    // CStringArray to fill 
    CStringArray SplitIt; // Call Split
    tok.Split(csSplit, ",", SplitIt,  TRUE);  // LIST #1 
    tok.Split(csSplit, ",", SplitIt, FALSE);  // LIST #2 
}
  
/********************************************************
// GetString Function
//********************************************************  
// 
//  GetString will return a string:
// 
//     abc
//     ...and more calls to GetString will return a strings: 
//     def
//     ghi
//     jkl
//
//********************************************************
void <SOME NAME>Dlg::OnGetstring() 
{
    CTokenEx tok;  
    char Buf[254];  CString
    csRef = "abc,def,,,ghi,,jkl,,"; 
    do 
    {
        // don't return blanks
        CString csRet = tok.GetString(csRef, ",",  FALSE);
        //  return blanks
        CString csRet = tok.GetString(csRef, ",",  TRUE);

        // Do something with the returned value.

    } while (!csRef.IsEmpty());
}

I hope that others find this class useful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Dan Madden
Engineer
Germany Germany
Member
I have been programming for 19 years (Unix C, Scripting, VB, C/C++, C#). I am getting too old to talk about it and been in the Security line of work (both Military/Civilian) for 25+ years.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionProblem with space 0x20 as delimiter?memberl_d_allan28 Feb '06 - 5:32 
Thanks for provding CTokenEx. I'm evaluating it for use in an application, and hope to avoid "re-inventing the wheel".
 
I especially appreciate that your class accomplishes all the tokenizing in one call, rather than multiple calls.
 
I don't believe CTokenEx handles the following correctly. I don't fully "grok" the class code, but it appears that there may be a problem with using the space character (0x20) as a delimiter. That would be a VERY typical situation, so I wanted to check.
 

CTokenEx tok;
CString csSplit("one, two ,. three");
CStringArray splitIt;
CString m_deliminator = ",. ";
 
tok.Split(csSplit, m_deliminator, splitIt, TRUE);

AnswerRe: Problem with space 0x20 as delimiter?memberDan Madden28 Feb '06 - 8:43 
Hi I_d_allan,
 
Well, I took your example and created a console app to test it and here was the results:
 
"one, two "
"three"
 
This is the correct result...
 
Below is the full code:
 

#include "stdafx.h"
#include "SplitTest.h"
 
#include "TokenEx.h"
 
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
 
// The one and only application object
CWinApp theApp;
 
using namespace std;
 
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
 
// initialize MFC and print and error on failure
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
// TODO: change error code to suit your needs
_tprintf(_T("Fatal Error: MFC initialization failed\n"));
nRetCode = 1;
}
else
{
CTokenEx tok;
CString csSplit("one, two ,. three");
CStringArray splitIt;
CString m_deliminator = ",. ";
tok.Split(csSplit, m_deliminator, splitIt, TRUE);
for (int n=0; n
 
Regards,
 
Dan
GeneralRe: Problem with space 0x20 as delimiter?memberl_d_allan28 Feb '06 - 9:22 
Hi Dan,
 
Thanks for the prompt and helpful reply to my possibly uninformed question. I was pleasantly surprised to get a response to a CodeProject article from 2000.
 
I hope to avoid "re-inventing the wheel", and I suppose I don't understand how your class is supposed to work. In the test case with delimiters of space, comma, and period, I would have expected the output to be:
one
two
three
 
I don't understand why "one, two " wasn't split into two tokens.???
 
I've been working on a specialized tokenizer for searching that only allows A-Z and a-z. The delimiters are anything else. Here are some of the test tokens I use:
char* testStringsWithTokens[] = {
// Test string               Expected results
"",                          "0 ", 
"a",                         "1 <1 a>",
"a b c",                     "3 <1 a><1 b><1 c>",
" a b c ",                   "3 <1 a><1 b><1 c>",
"one two three",             "3 <3 one><3 two><5 three>",
"one\ttwo\tthree",           "3 <3 one><3 two><5 three>",
"one,two,,,, ,,, three,,,",  "3 <3 one><3 two><5 three>",
" one two three",            "3 <3 one><3 two><5 three>",
" one\ttwo\tthree",          "3 <3 one><3 two><5 three>",
" one,two,,,, ,,, three,,,", "3 <3 one><3 two><5 three>",
// snip ... lots more test cases
};
The "even" strings (i.e. 0, 2, 4, etc) are the tests, and the "odd" strings (1, 3, 5, etc) are the cppunit-like expected results (number of tokens and then length of each token found).
 
Using your code, I would declare the delimiters to be everything except A-Z and a-z.
 
Is there a way to use your class to accomplish the above? Or am I doing something incorrectly? I really hope to be able to reuse your class in my application.
 
Thanks again.
 

-- modified at 19:18 Tuesday 28th February, 2006
GeneralRe: Problem with space 0x20 as delimiter?memberDan Madden28 Feb '06 - 10:38 
Hi Again,
 
Here is something I threw together in the Class code after I added a new parameter to the function:
 
This is the Output:
 
one, two ,. three <= Deliminator BEFORE
one~~two~~~~three <= Deliminator AFTER first pass
"one"
"two"
"three"
void CTokenEx::Split(CString Source, CString Deliminator, BOOL bMultipleDeliminator, CStringArray& AddIt, BOOL bAddEmpty)
Here is what I put inside the class:
if (bMultipleDeliminator) 
	{
	//
	CString csStr = newCString;
	int nDelCount = Deliminator.GetLength();
 
	CString csMultDel = _T("");
	for (int n=0; n<nDelCount; n++) 
	{
		csMultDel = _T("");
		csMultDel += Deliminator[n];
		csStr.Replace(csMultDel,"~");
	}
	newCString = csStr;
}
Ok, now here is the new function:
void CTokenEx::Split(CString Source, CString Deliminator, BOOL bMultipleDeliminator, CStringArray& AddIt, BOOL bAddEmpty)
{
	// initialize the variables
	CString		 newCString = Source;
	CString		 tmpCString = "";
	CString		 AddCString = "";
 
	int pos1 = 0;
	int pos = 0;
 
	AddIt.RemoveAll();
 
	if (Deliminator.IsEmpty()) 
	{
		// Add default [comma] if empty!
		// acknowledgement: Prasad [gprasad@rti.ie]
		Deliminator = ","; 
	}
 
	if (bMultipleDeliminator) 
	{
		//
		CString csStr = newCString;
		int nDelCount = Deliminator.GetLength();
 
		CString csMultDel = _T("");
		for (int n=0; n<nDelCount; n++) 
		{
			csMultDel = _T("");
			csMultDel += Deliminator[n];
			csStr.Replace(csMultDel,"~");
		}
		Deliminator = _T("~");
		newCString = csStr;
	}
 
	// do this loop as long as you have a deliminator
	do {
		// set to zero
		pos1 = 0;
		// position of deliminator starting at pos1 (0)
		pos = newCString.Find(Deliminator, pos1);
		// if the deliminator is found...
		if ( pos != -1 ) 
		{
			// load a new var with the info left
			// of the position
			CString AddCString = newCString.Left(pos);// - 1);

			if (!AddCString.IsEmpty()) 
			{
				// if there is a string to add, then
				// add it to the Array
				AddIt.Add(AddCString);
			}
			else if (bAddEmpty) 
			{
				// if empty strings are ok, then add them
				AddIt.Add(AddCString);
			}
 
			// make a copy of the of this var. with the info
			// right of the deliminator
			tmpCString = newCString.Mid(pos + Deliminator.GetLength());
			
			// reset this var with new info
			newCString = tmpCString;
		}
	} while ( pos != -1 );
	
	if ((!newCString.IsEmpty()) || bAddEmpty) 
	{
		// as long as the variable is not emty, add it
		AddIt.Add(newCString);
	}
}
Hope this helps!!

 
Regards,
 
Dan
GeneralRe: Problem with space 0x20 as delimiter?memberl_d_allan28 Feb '06 - 12:07 
Hi Dan,
 
Thanks ... I think that will make it work both "my way" and "your way". We apparently have different notions of what a delimiter is, and how to handle it.
 
Actually, I don't understand why the original/default is available. I would think you would ALWAYS want to "throw away" any delimiter and use that to break up tokens, whether the delimiters show up in multiples or not.
 
I am perhaps being slow to understand why you would want to specify bMultipleDelimter in any way other than causing "one, two ,. three" to come out:
"one"
"two"
"three"
 
Not meaning to be argumentative, but I would think the expected behavior would pretty much ALWAYS be that "one, two ,. three" would be handled that way, and I would think the bMultipleDelimiter would be left out of the parameter list. To me, it detracts and creates potential confusion to have that option (but perhaps I am being slow ... I realize that I can be "not the brightest bulb in the box". It has been a long day.
 
And again thanks for providing the code and helping out this confused person.
 


GeneralRe: Problem with space 0x20 as delimiter?memberDan Madden2 Mar '06 - 6:49 
Hi Again,
 
This was done simply to explain how it could be done. I updated my version because of your earlier question and was implemented NOT using an extra parameter Poke tongue | ;-P
 
I thought about updating this particular article, but probably won't unless really asked to by some folks out there (because of it's age). I have also noted your name in the code for this suggestion Big Grin | :-D
 


 
Regards,
 
Dan
GeneralRe: Problem with space 0x20 as delimiter?memberl_d_allan28 Feb '06 - 12:16 
Hi Dan,
 
I stared at your article a bit closer and noticed your statement:
 
The Split and GetString functions recognize multiple delimiters as an empty string so that it will NOT add blanks to an array (unless you want it to). See example code below:
 
It wasn't clear to me what that meant the first time I read it, and it still is fuzzy, at least to me. You mention that the sample code clarifies, but the sample code only deals with just a comma being a delimiter.
 
Not meaning to come across as critical or unappreciative .... it is great that you submitted the code and are helpful to those of us who want to re-use it, but need your patient help to figure it out.

GeneralRe: Problem with space 0x20 as delimiter?memberDan Madden2 Mar '06 - 6:43 
Hi Again,
 
Well, it is basically explaining that if you had string "a,a,c,,,,d,," with a comma as the deliminator and the "BOOL bAddEmpty" parameter set to TRUE, the CStringArray would look like this:
 
"a"
"b"
"c"
""
""
""
"d"
""
""
 
It would have the size of "9". If it was done with "BOOL bAddEmpty" set to FALSE, then the CStringArray would have looked like this:
 
"a"
"b"
"c"
"d"
 
It would have the size of "4".
 
Does that help explain it? This is also said in the web page...

 
Regards,
 
Dan
AnswerNew Split Function for this SuggestionmemberDan Madden2 Mar '06 - 7:06 
Here it is (thanks l_d_allan):
 
By using the code below, you could test it by doing this in a "main()":
CTokenEx      tok;
CString       csSplit("one, two ,. three");
CStringArray  splitIt;
CString       m_deliminator = ",. ";
 
tok.Split(csSplit, m_deliminator, splitIt, FALSE);
 
printf("\n%s\n",csSplit);
 
for (int n=0; n<splitIt.GetSize(); n++)
{
	printf("\"%s\"\n",splitIt.GetAt(n));
}
This would produce:
one, two ,. three
"one"
"two"
"three"
void CTokenEx::Split(CString Source, CString Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
{
	// initialize the variables
	CString		 newCString = Source;
	CString		 tmpCString = "";
	CString		 AddCString = "";
 
	int pos1 = 0;
	int pos = 0;
 
	AddIt.RemoveAll();
 
	if (Deliminator.IsEmpty()) {
		// Add default [comma] if empty!
		// acknowledgement: Prasad [gprasad@rti.ie]
		Deliminator = ","; 
	}
 
	//
	CString csStr = newCString;
	int nDelCount = Deliminator.GetLength();
 
	///////////////////////////////////////////////////
	//
	// The below block was created as a suggestion 
	// from "l_d_allan" at CodeProject.com
	//
	CString csMultDel = _T("");
	for (int n=0; n<nDelCount; n++) 
	{
		csMultDel = _T("");
		csMultDel += Deliminator[n];
		csStr.Replace(csMultDel,"~");
	}
	Deliminator = _T("~");
	newCString = csStr;
	//
	///////////////////////////////////////////////////

 
	// do this loop as long as you have a deliminator
	do {
		// set to zero
		pos1 = 0;
		// position of deliminator starting at pos1 (0)
		pos = newCString.Find(Deliminator, pos1);
		// if the deliminator is found...
		if ( pos != -1 ) {
 
			// load a new var with the info left
			// of the position
			CString AddCString = newCString.Left(pos);
 
			if (!AddCString.IsEmpty()) {
				// if there is a string to add, then
				// add it to the Array
				AddIt.Add(AddCString);
			}
			else if (bAddEmpty) {
				// if empty strings are ok, then add them
				AddIt.Add(AddCString);
			}
 
			// make a copy of the of this var. with the info
			// right of the deliminator
			tmpCString = newCString.Mid(pos + Deliminator.GetLength());
			
			// reset this var with new info
			newCString = tmpCString;
		}
	} while ( pos != -1 );
	
	if ((!newCString.IsEmpty()) || bAddEmpty) {
		// as long as the variable is not emty, add it
		AddIt.Add(newCString);
	}
}

 
Regards,
 
Dan
GeneralRe: New Split Function for this Suggestionmemberl_d_allan2 Mar '06 - 16:21 
Works quite well. Nice job!
 
I would make another suggestion or two: separate the specification of the delimiters from the call to Split. There is more than a trivial amount of overhead to get the delimiters set up, and you might want to allow "reuse" of the same delimiters for a bunch of calls.
 
I was also wondering with the CString parameters if you want to pass by reference or value ... or whether it makes any difference ... you pass the CStringArray by reference, but not the two CString parameters.
 
void CTokenEx::Split(CString Source, CString Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
 
or
 
void CTokenEx::Split(CString& Source, CString& Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
 
I tried making the change .... odd ... the Source parameter can be passed by reference, but not the Delimiter parameter. (but there is LOT about MFC that I don't understand Sigh | :sigh: )
 
I applied the following test cases to the revised code to only allow A-Z and a-z, and it passes just fine. Sweet! Smile | :)
 
   struct s_testTokenizer {
      int   expectedTokens;
      char* pActualPattern;
      char* pExpectedPattern;
   };
   struct s_testTokenizer testTokenizer[] = { 
      { 1, "a",                            "1 <1 a>"},
      { 3, "a b c",                        "3 <1 a><1 b><1 c>"},
      { 3, " a b c ",                      "3 <1 a><1 b><1 c>"},
      { 3, "one two three",                "3 <3 one><3 two><5 three>"},
      { 3, "one\ttwo\tthree",              "3 <3 one><3 two><5 three>"},
      { 3, "one,two,,,, ,,, three,,,",     "3 <3 one><3 two><5 three>"},
 
      { 3, " one two three",               "3 <3 one><3 two><5 three>"},
      { 3, " one\ttwo\tthree",             "3 <3 one><3 two><5 three>"},
      { 3, " one\ntwo\nthree",             "3 <3 one><3 two><5 three>"},
      { 3, " one,two,,,, ,,, three,,,",    "3 <3 one><3 two><5 three>"},
 
      { 3, " one two three ",              "3 <3 one><3 two><5 three>"},
      { 3, " one\ttwo\tthree ",            "3 <3 one><3 two><5 three>"},
      { 3, " one,two,,,, ,,, three,,, ",   "3 <3 one><3 two><5 three>"},
 
      { 3, "one two three ",               "3 <3 one><3 two><5 three>"},
      { 3, "one\ttwo\tthree",              "3 <3 one><3 two><5 three>"},
      { 3, " one\ttwo\tthree ",            "3 <3 one><3 two><5 three>"},
      { 3, " one\ttwo\tthree",             "3 <3 one><3 two><5 three>"},
      { 3, "one\ttwo\tthree ",             "3 <3 one><3 two><5 three>"},
      { 3, "\tone\ttwo\tthree ",           "3 <3 one><3 two><5 three>"},
      { 3, "\tone\ttwo\tthree\t",          "3 <3 one><3 two><5 three>"},
 
      { 3, "one,two,,,, ,,, three,,, ",    "3 <3 one><3 two><5 three>"},
 
      { 3, "  one   \t  two  \t  three  ", "3 <3 one><3 two><5 three>"},
      { 0, "",                             "0 "},
      { 0, "1",                            "0 "},
      { 0, "  \t  ",                       "0 "},
      { 0, "123",                          "0 "},
      { 4, "a1b2c3d",                      "4 <1 a><1 b><1 c><1 d>"},
      { 4, " a1b2c3d ",                    "4 <1 a><1 b><1 c><1 d>"},
 
      { 4, " a1bb2c3d ",                   "4 <1 a><2 bb><1 c><1 d>"},
      { 4, "a1bb2c3d",                     "4 <1 a><2 bb><1 c><1 d>"},
      { 4, "a1bb2c3d ",                    "4 <1 a><2 bb><1 c><1 d>"},
      { 4, " a1bb2c3d",                    "4 <1 a><2 bb><1 c><1 d>"},
 
      { 1, " 12abc345 ",                   "1 <3 abc>"},
      { 1, " 12abc345",                    "1 <3 abc>"},
      { 1, "12abc345 ",                    "1 <3 abc>"},
      { 1, "12abc345",                     "1 <3 abc>"},
 
      { 2, "12abc345defg678",              "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
      { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
      { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
      { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg678 ",            "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg678 ",            "2 <3 abc><4 defg>"},
      { 2, " 12 abc 345 defg 678 ",        "2 <3 abc><4 defg>"},
      { 2, "12 abc 345 defg 678 ",         "2 <3 abc><4 defg>"},
      { 2, "12 abc 345 defg 678 ",         "2 <3 abc><4 defg>"},
      { 2, " 12 abc 345 defg 678 ",        "2 <3 abc><4 defg>"},
 
      { 2, "12abc345defg",                 "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
      { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
      { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
      { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg ",               "2 <3 abc><4 defg>"},
      { 2, " 12abc345defg ",               "2 <3 abc><4 defg>"},
      { 2, " 12 abc 345 defg ",            "2 <3 abc><4 defg>"},
      { 2, "12 abc 345 defg ",             "2 <3 abc><4 defg>"},
      { 2, "12 abc 345 defg ",             "2 <3 abc><4 defg>"},
      { 2, " 12 abc 345 defg ",            "2 <3 abc><4 defg>"},
 
      { 2, "abc345defg678",                "2 <3 abc><4 defg>"},
      { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
      { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
      { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
      { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
      { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
      { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
      { 2, " abc345defg678 ",              "2 <3 abc><4 defg>"},
      { 2, " abc345defg678 ",              "2 <3 abc><4 defg>"},
      { 2, "  abc 345 defg 678 ",          "2 <3 abc><4 defg>"},
      { 2, " abc 345 defg 678 ",           "2 <3 abc><4 defg>"},
      { 2, " abc 345 defg 678 ",           "2 <3 abc><4 defg>"},
      { 2, "  abc 345 defg 678 ",          "2 <3 abc><4 defg>"},
 
      { 2, " 00 11 2 3 4 5 6 77 88 99 aa bb ","2 <2 aa><2 bb>"},
      { 9, " aa bb c d e f g hh ii ",         "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
      {10, " aa bb c d e f g hh ii jj ",      "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
      {11, " aa bb c d e f g hh ii jj kk ",   "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
      {12, " aa bb c d e f g hh ii jj kk ll ","12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
 
      { 9, "aa bb c d e f g hh ii ",          "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
      {10, "aa bb c d e f g hh ii jj ",       "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
      {11, "aa bb c d e f g hh ii jj kk ",    "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
      {12, "aa bb c d e f g hh ii jj kk ll ", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
 
      { 9, " aa bb c d e f g hh ii",           "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
      {10, " aa bb c d e f g hh ii jj",       "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
      {11, " aa bb c d e f g hh ii jj kk",    "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
      {12, " aa bb c d e f g hh ii jj kk ll", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
 
      { 9, "aa bb c d e f g hh ii",           "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
      {10, "aa bb c d e f g hh ii jj",        "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
      {11, "aa bb c d e f g hh ii jj kk",     "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
      {12, "aa bb c d e f g hh ii jj kk ll",  "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
 
   };
   int testCount = sizeof(testTokenizer) / sizeof(testTokenizer[0]);
   printf("TestCount: %d\n", testCount);
 
   CString       m_deliminator = " ,.0123456789;:-_+=\n\t\r";
   char          actualTokenStrs[200];
   char          innerTokenStr[100];
 
   for (int test = 0; test < testCount; ++test) {
      CTokenEx      tok;
      char*         pActualPattern = testTokenizer[test].pActualPattern;
      char*         pExpectedPattern = testTokenizer[test].pExpectedPattern;
      CString       csSplit(pActualPattern);
      CStringArray  splitIt;
 
      tok.Split(csSplit, m_deliminator, splitIt, FALSE);
		   
      int size = splitIt.GetSize();
 
      sprintf(actualTokenStrs, "%d ", size);
      for (int iNum = 0; iNum < size; ++iNum) {
         sprintf(innerTokenStr, "<%d %s>", splitIt[iNum].GetLength(), (LPCTSTR)(splitIt[iNum]));
         strcat(actualTokenStrs, innerTokenStr);
      }
      if (strcmp(testTokenizer[test].pExpectedPattern, actualTokenStrs) != 0) {
         printf("\nTokenizer problem: \nInput:  [%s]\nExpect: [%s]\nActual: [%s]\n\n",
            pActualPattern, pExpectedPattern, actualTokenStrs);
         errorEncountered++;
      }
      else {
         printf("OK: [%s] --> [%s]\n", pActualPattern, actualTokenStrs);
      }
   }
   if (errorEncountered == 0) {
      printf("\nSuccess if this prints out\n");
   }

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 27 Jan 2000
Article Copyright 2000 by Dan Madden
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid