Click here to Skip to main content
15,886,362 members
Articles / Desktop Programming / MFC
Article

String Tokenizer Class (CTokenEx)

Rate me:
Please Sign up or sign in to vote.
4.66/5 (17 votes)
26 Jan 2000CPOL 106.2K   2.4K   30   15
A very simple string tokenizer class.

Introduction

Basically, I've seen other string tokenizers and they lacked the functionality I was looking for. Therefore, I created one for myself using the KISS (Keep-It-Simple-Stupid) method. This is a VERY SIMPLE sample!!!!

Here is a summary of the functionality in the CTokenEx class, you can:

  • use SplitPath to break-up the path into sections (Drive/Share name, Directory, Filename, Extension). Also, recognizes UNC names (which _tsplitpath doesn't).
  • use Join to create a CString from a CStringArray with delimiters of your choice.
  • use Split to break-up a CString into a CStringArray (according to the delimiter).
  • use GetString to get the first sub-string in a CString (according to the delimiter).

NOTE:

The Split and GetString functions recognize multiple delimiters as an empty string so that it will NOT add blanks to an array (unless you want it to). See example code below:

Say you have a CString that contains: "abc,def,,,ghi,,jkl,,"

//********************************************************
// Split Function
//********************************************************
//
// Split will fill an array with:
//
// NOTE:  IF PARAM #4 IS TRUE, YOU'LL SEE LIST #1 ELSE LIST #2
//
// LIST #1:
//  
// String  Position
// ======  ========
// abc     0
// def     1
//         2
//         3
// ghi     4
//         5
// jkl     6
//         7
//         8
//
//
// LIST #2 (Same String):
//  
// String  Position
// ======  ========
// abc     0
// def     1
// ghi     2
// jkl     3
//
//********************************************************
void <SOME NAME>Dlg::OnSplit() 
{
    CTokenEx tok;

    // CString for the Split Function
    CString csSplit = "abc,def,,,ghi,,jkl,,";

    // CStringArray to fill 
    CStringArray SplitIt; // Call Split
    tok.Split(csSplit, ",", SplitIt,  TRUE);  // LIST #1 
    tok.Split(csSplit, ",", SplitIt, FALSE);  // LIST #2 
}
  
/********************************************************
// GetString Function
//********************************************************  
// 
//  GetString will return a string:
// 
//     abc
//     ...and more calls to GetString will return a strings: 
//     def
//     ghi
//     jkl
//
//********************************************************
void <SOME NAME>Dlg::OnGetstring() 
{
    CTokenEx tok;  
    char Buf[254];  CString
    csRef = "abc,def,,,ghi,,jkl,,"; 
    do 
    {
        // don't return blanks
        CString csRet = tok.GetString(csRef, ",",  FALSE);
        //  return blanks
        CString csRet = tok.GetString(csRef, ",",  TRUE);

        // Do something with the returned value.

    } while (!csRef.IsEmpty());
}

I hope that others find this class useful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Product Manager
Germany Germany
I have been programming (as a hobby) for 20+ years (Unix C, Scripting, VB, C/C++, C#). I am getting too old to talk about it and been in the Security line of work (both Military/Civilian) for 25+ years.

Comments and Discussions

 
Praisevery useful Pin
Southmountain1-Nov-21 10:02
Southmountain1-Nov-21 10:02 
QuestionProblem with space 0x20 as delimiter? Pin
l_d_allan28-Feb-06 5:32
l_d_allan28-Feb-06 5:32 
AnswerRe: Problem with space 0x20 as delimiter? Pin
Dan Madden28-Feb-06 8:43
Dan Madden28-Feb-06 8:43 
GeneralRe: Problem with space 0x20 as delimiter? Pin
l_d_allan28-Feb-06 9:22
l_d_allan28-Feb-06 9:22 
GeneralRe: Problem with space 0x20 as delimiter? Pin
Dan Madden28-Feb-06 10:38
Dan Madden28-Feb-06 10:38 
GeneralRe: Problem with space 0x20 as delimiter? Pin
l_d_allan28-Feb-06 12:07
l_d_allan28-Feb-06 12:07 
GeneralRe: Problem with space 0x20 as delimiter? Pin
Dan Madden2-Mar-06 6:49
Dan Madden2-Mar-06 6:49 
GeneralRe: Problem with space 0x20 as delimiter? Pin
l_d_allan28-Feb-06 12:16
l_d_allan28-Feb-06 12:16 
GeneralRe: Problem with space 0x20 as delimiter? Pin
Dan Madden2-Mar-06 6:43
Dan Madden2-Mar-06 6:43 
AnswerNew Split Function for this Suggestion Pin
Dan Madden2-Mar-06 7:06
Dan Madden2-Mar-06 7:06 
GeneralRe: New Split Function for this Suggestion Pin
l_d_allan2-Mar-06 16:21
l_d_allan2-Mar-06 16:21 
Works quite well. Nice job!

I would make another suggestion or two: separate the specification of the delimiters from the call to Split. There is more than a trivial amount of overhead to get the delimiters set up, and you might want to allow "reuse" of the same delimiters for a bunch of calls.

I was also wondering with the CString parameters if you want to pass by reference or value ... or whether it makes any difference ... you pass the CStringArray by reference, but not the two CString parameters.

void CTokenEx::Split(CString Source, CString Deliminator, CStringArray& AddIt, BOOL bAddEmpty)

or

void CTokenEx::Split(CString& Source, CString& Deliminator, CStringArray& AddIt, BOOL bAddEmpty)

I tried making the change .... odd ... the Source parameter can be passed by reference, but not the Delimiter parameter. (but there is LOT about MFC that I don't understand Sigh | :sigh: )

I applied the following test cases to the revised code to only allow A-Z and a-z, and it passes just fine. Sweet! Smile | :)

struct s_testTokenizer {
   int   expectedTokens;
   char* pActualPattern;
   char* pExpectedPattern;
};
struct s_testTokenizer testTokenizer[] = {
   { 1, "a",                            "1 <1 a>"},
   { 3, "a b c",                        "3 <1 a><1 b><1 c>"},
   { 3, " a b c ",                      "3 <1 a><1 b><1 c>"},
   { 3, "one two three",                "3 <3 one><3 two><5 three>"},
   { 3, "one\ttwo\tthree",              "3 <3 one><3 two><5 three>"},
   { 3, "one,two,,,, ,,, three,,,",     "3 <3 one><3 two><5 three>"},

   { 3, " one two three",               "3 <3 one><3 two><5 three>"},
   { 3, " one\ttwo\tthree",             "3 <3 one><3 two><5 three>"},
   { 3, " one\ntwo\nthree",             "3 <3 one><3 two><5 three>"},
   { 3, " one,two,,,, ,,, three,,,",    "3 <3 one><3 two><5 three>"},

   { 3, " one two three ",              "3 <3 one><3 two><5 three>"},
   { 3, " one\ttwo\tthree ",            "3 <3 one><3 two><5 three>"},
   { 3, " one,two,,,, ,,, three,,, ",   "3 <3 one><3 two><5 three>"},

   { 3, "one two three ",               "3 <3 one><3 two><5 three>"},
   { 3, "one\ttwo\tthree",              "3 <3 one><3 two><5 three>"},
   { 3, " one\ttwo\tthree ",            "3 <3 one><3 two><5 three>"},
   { 3, " one\ttwo\tthree",             "3 <3 one><3 two><5 three>"},
   { 3, "one\ttwo\tthree ",             "3 <3 one><3 two><5 three>"},
   { 3, "\tone\ttwo\tthree ",           "3 <3 one><3 two><5 three>"},
   { 3, "\tone\ttwo\tthree\t",          "3 <3 one><3 two><5 three>"},

   { 3, "one,two,,,, ,,, three,,, ",    "3 <3 one><3 two><5 three>"},

   { 3, "  one   \t  two  \t  three  ", "3 <3 one><3 two><5 three>"},
   { 0, "",                             "0 "},
   { 0, "1",                            "0 "},
   { 0, "  \t  ",                       "0 "},
   { 0, "123",                          "0 "},
   { 4, "a1b2c3d",                      "4 <1 a><1 b><1 c><1 d>"},
   { 4, " a1b2c3d ",                    "4 <1 a><1 b><1 c><1 d>"},

   { 4, " a1bb2c3d ",                   "4 <1 a><2 bb><1 c><1 d>"},
   { 4, "a1bb2c3d",                     "4 <1 a><2 bb><1 c><1 d>"},
   { 4, "a1bb2c3d ",                    "4 <1 a><2 bb><1 c><1 d>"},
   { 4, " a1bb2c3d",                    "4 <1 a><2 bb><1 c><1 d>"},

   { 1, " 12abc345 ",                   "1 <3 abc>"},
   { 1, " 12abc345",                    "1 <3 abc>"},
   { 1, "12abc345 ",                    "1 <3 abc>"},
   { 1, "12abc345",                     "1 <3 abc>"},

   { 2, "12abc345defg678",              "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg678",             "2 <3 abc><4 defg>"},
   { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
   { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
   { 2, "12abc345defg678 ",             "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg678 ",            "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg678 ",            "2 <3 abc><4 defg>"},
   { 2, " 12 abc 345 defg 678 ",        "2 <3 abc><4 defg>"},
   { 2, "12 abc 345 defg 678 ",         "2 <3 abc><4 defg>"},
   { 2, "12 abc 345 defg 678 ",         "2 <3 abc><4 defg>"},
   { 2, " 12 abc 345 defg 678 ",        "2 <3 abc><4 defg>"},

   { 2, "12abc345defg",                 "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg",                "2 <3 abc><4 defg>"},
   { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
   { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
   { 2, "12abc345defg ",                "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg ",               "2 <3 abc><4 defg>"},
   { 2, " 12abc345defg ",               "2 <3 abc><4 defg>"},
   { 2, " 12 abc 345 defg ",            "2 <3 abc><4 defg>"},
   { 2, "12 abc 345 defg ",             "2 <3 abc><4 defg>"},
   { 2, "12 abc 345 defg ",             "2 <3 abc><4 defg>"},
   { 2, " 12 abc 345 defg ",            "2 <3 abc><4 defg>"},

   { 2, "abc345defg678",                "2 <3 abc><4 defg>"},
   { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
   { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
   { 2, " abc345defg678",               "2 <3 abc><4 defg>"},
   { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
   { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
   { 2, "abc345defg678 ",               "2 <3 abc><4 defg>"},
   { 2, " abc345defg678 ",              "2 <3 abc><4 defg>"},
   { 2, " abc345defg678 ",              "2 <3 abc><4 defg>"},
   { 2, "  abc 345 defg 678 ",          "2 <3 abc><4 defg>"},
   { 2, " abc 345 defg 678 ",           "2 <3 abc><4 defg>"},
   { 2, " abc 345 defg 678 ",           "2 <3 abc><4 defg>"},
   { 2, "  abc 345 defg 678 ",          "2 <3 abc><4 defg>"},

   { 2, " 00 11 2 3 4 5 6 77 88 99 aa bb ","2 <2 aa><2 bb>"},
   { 9, " aa bb c d e f g hh ii ",         "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
   {10, " aa bb c d e f g hh ii jj ",      "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
   {11, " aa bb c d e f g hh ii jj kk ",   "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
   {12, " aa bb c d e f g hh ii jj kk ll ","12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},

   { 9, "aa bb c d e f g hh ii ",          "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
   {10, "aa bb c d e f g hh ii jj ",       "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
   {11, "aa bb c d e f g hh ii jj kk ",    "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
   {12, "aa bb c d e f g hh ii jj kk ll ", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},

   { 9, " aa bb c d e f g hh ii",           "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
   {10, " aa bb c d e f g hh ii jj",       "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
   {11, " aa bb c d e f g hh ii jj kk",    "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
   {12, " aa bb c d e f g hh ii jj kk ll", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},

   { 9, "aa bb c d e f g hh ii",           "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
   {10, "aa bb c d e f g hh ii jj",        "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
   {11, "aa bb c d e f g hh ii jj kk",     "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
   {12, "aa bb c d e f g hh ii jj kk ll",  "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},

};
int testCount = sizeof(testTokenizer) / sizeof(testTokenizer[0]);
printf("TestCount: %d\n", testCount);

CString       m_deliminator = " ,.0123456789;:-_+=\n\t\r";
char          actualTokenStrs[200];
char          innerTokenStr[100];

for (int test = 0; test < testCount; ++test) {
   CTokenEx      tok;
   char*         pActualPattern = testTokenizer[test].pActualPattern;
   char*         pExpectedPattern = testTokenizer[test].pExpectedPattern;
   CString       csSplit(pActualPattern);
   CStringArray  splitIt;

   tok.Split(csSplit, m_deliminator, splitIt, FALSE);

   int size = splitIt.GetSize();

   sprintf(actualTokenStrs, "%d ", size);
   for (int iNum = 0; iNum < size; ++iNum) {
      sprintf(innerTokenStr, "<%d %s>", splitIt[iNum].GetLength(), (LPCTSTR)(splitIt[iNum]));
      strcat(actualTokenStrs, innerTokenStr);
   }
   if (strcmp(testTokenizer[test].pExpectedPattern, actualTokenStrs) != 0) {
      printf("\nTokenizer problem: \nInput:  [%s]\nExpect: [%s]\nActual: [%s]\n\n",
         pActualPattern, pExpectedPattern, actualTokenStrs);
      errorEncountered++;
   }
   else {
      printf("OK: [%s] --> [%s]\n", pActualPattern, actualTokenStrs);
   }
}
if (errorEncountered == 0) {
   printf("\nSuccess if this prints out\n");
}

GeneralProblem with your class Pin
28-Jun-01 10:24
suss28-Jun-01 10:24 
GeneralRe: Problem with your class Pin
Dan Madden28-Jun-01 18:01
Dan Madden28-Jun-01 18:01 
GeneralSTL string tokenizer Pin
Member 112727-Jan-00 2:03
Member 112727-Jan-00 2:03 
GeneralRe: STL string tokenizer Pin
daniel madden30-Jan-00 21:40
daniel madden30-Jan-00 21:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.