Add your own alternative version
Stats
93.7K views 2.1K downloads 29 bookmarked
Posted
26 Jan 2000

Comments and Discussions



Thanks for provding CTokenEx. I'm evaluating it for use in an application, and hope to avoid "reinventing the wheel".
I especially appreciate that your class accomplishes all the tokenizing in one call, rather than multiple calls.
I don't believe CTokenEx handles the following correctly. I don't fully "grok" the class code, but it appears that there may be a problem with using the space character (0x20) as a delimiter. That would be a VERY typical situation, so I wanted to check.
CTokenEx tok;
CString csSplit("one, two ,. three");
CStringArray splitIt;
CString m_deliminator = ",. ";
tok.Split(csSplit, m_deliminator, splitIt, TRUE);





Hi I_d_allan,
Well, I took your example and created a console app to test it and here was the results:
"one, two " "three"
This is the correct result...
Below is the full code:
#include "stdafx.h"
#include "SplitTest.h"
#include "TokenEx.h"
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
CWinApp theApp;
using namespace std;
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
_tprintf(_T("Fatal Error: MFC initialization failed\n"));
nRetCode = 1;
}
else
{
CTokenEx tok;
CString csSplit("one, two ,. three");
CStringArray splitIt;
CString m_deliminator = ",. ";
tok.Split(csSplit, m_deliminator, splitIt, TRUE);
for (int n=0; n<splitIt.GetSize(); n++) {
printf("\"%s\"\n",splitIt.GetAt(n));
}
}
return nRetCode;
}
Regards,
Dan





Hi Dan,
Thanks for the prompt and helpful reply to my possibly uninformed question. I was pleasantly surprised to get a response to a CodeProject article from 2000.
I hope to avoid "reinventing the wheel", and I suppose I don't understand how your class is supposed to work. In the test case with delimiters of space, comma, and period, I would have expected the output to be: one two three
I don't understand why "one, two " wasn't split into two tokens.???
I've been working on a specialized tokenizer for searching that only allows AZ and az. The delimiters are anything else. Here are some of the test tokens I use:
char* testStringsWithTokens[] = {
"", "0 ",
"a", "1 <1 a>",
"a b c", "3 <1 a><1 b><1 c>",
" a b c ", "3 <1 a><1 b><1 c>",
"one two three", "3 <3 one><3 two><5 three>",
"one\ttwo\tthree", "3 <3 one><3 two><5 three>",
"one,two,,,, ,,, three,,,", "3 <3 one><3 two><5 three>",
" one two three", "3 <3 one><3 two><5 three>",
" one\ttwo\tthree", "3 <3 one><3 two><5 three>",
" one,two,,,, ,,, three,,,", "3 <3 one><3 two><5 three>",
}; The "even" strings (i.e. 0, 2, 4, etc) are the tests, and the "odd" strings (1, 3, 5, etc) are the cppunitlike expected results (number of tokens and then length of each token found).
Using your code, I would declare the delimiters to be everything except AZ and az.
Is there a way to use your class to accomplish the above? Or am I doing something incorrectly? I really hope to be able to reuse your class in my application.
Thanks again.
 modified at 19:18 Tuesday 28th February, 2006





Hi Again,
Here is something I threw together in the Class code after I added a new parameter to the function:
This is the Output:
one, two ,. three <= Deliminator BEFORE one~~two~~~~three <= Deliminator AFTER first pass "one" "two" "three"
void CTokenEx::Split(CString Source, CString Deliminator, BOOL bMultipleDeliminator, CStringArray& AddIt, BOOL bAddEmpty) Here is what I put inside the class:
if (bMultipleDeliminator)
{
CString csStr = newCString;
int nDelCount = Deliminator.GetLength();
CString csMultDel = _T("");
for (int n=0; n<nDelCount; n++)
{
csMultDel = _T("");
csMultDel += Deliminator[n];
csStr.Replace(csMultDel,"~");
}
newCString = csStr;
} Ok, now here is the new function:
void CTokenEx::Split(CString Source, CString Deliminator, BOOL bMultipleDeliminator, CStringArray& AddIt, BOOL bAddEmpty)
{
CString newCString = Source;
CString tmpCString = "";
CString AddCString = "";
int pos1 = 0;
int pos = 0;
AddIt.RemoveAll();
if (Deliminator.IsEmpty())
{
Deliminator = ",";
}
if (bMultipleDeliminator)
{
CString csStr = newCString;
int nDelCount = Deliminator.GetLength();
CString csMultDel = _T("");
for (int n=0; n<nDelCount; n++)
{
csMultDel = _T("");
csMultDel += Deliminator[n];
csStr.Replace(csMultDel,"~");
}
Deliminator = _T("~");
newCString = csStr;
}
do {
pos1 = 0;
pos = newCString.Find(Deliminator, pos1);
if ( pos != 1 )
{
CString AddCString = newCString.Left(pos);
if (!AddCString.IsEmpty())
{
AddIt.Add(AddCString);
}
else if (bAddEmpty)
{
AddIt.Add(AddCString);
}
tmpCString = newCString.Mid(pos + Deliminator.GetLength());
newCString = tmpCString;
}
} while ( pos != 1 );
if ((!newCString.IsEmpty())  bAddEmpty)
{
AddIt.Add(newCString);
}
} Hope this helps!!
Regards,
Dan





Hi Dan,
Thanks ... I think that will make it work both "my way" and "your way". We apparently have different notions of what a delimiter is, and how to handle it.
Actually, I don't understand why the original/default is available. I would think you would ALWAYS want to "throw away" any delimiter and use that to break up tokens, whether the delimiters show up in multiples or not.
I am perhaps being slow to understand why you would want to specify bMultipleDelimter in any way other than causing "one, two ,. three" to come out: "one" "two" "three"
Not meaning to be argumentative, but I would think the expected behavior would pretty much ALWAYS be that "one, two ,. three" would be handled that way, and I would think the bMultipleDelimiter would be left out of the parameter list. To me, it detracts and creates potential confusion to have that option (but perhaps I am being slow ... I realize that I can be "not the brightest bulb in the box". It has been a long day.<g>
And again thanks for providing the code and helping out this confused person.






Hi Dan,
I stared at your article a bit closer and noticed your statement:
The Split and GetString functions recognize multiple delimiters as an empty string so that it will NOT add blanks to an array (unless you want it to). See example code below:
It wasn't clear to me what that meant the first time I read it, and it still is fuzzy, at least to me. You mention that the sample code clarifies, but the sample code only deals with just a comma being a delimiter.
Not meaning to come across as critical or unappreciative .... it is great that you submitted the code and are helpful to those of us who want to reuse it, but need your patient help to figure it out.





Hi Again,
Well, it is basically explaining that if you had string "a,a,c,,,,d,," with a comma as the deliminator and the "BOOL bAddEmpty" parameter set to TRUE, the CStringArray would look like this:
"a" "b" "c" "" "" "" "d" "" ""
It would have the size of "9". If it was done with "BOOL bAddEmpty" set to FALSE, then the CStringArray would have looked like this:
"a" "b" "c" "d"
It would have the size of "4".
Does that help explain it? This is also said in the web page...
Regards,
Dan





Here it is (thanks l_d_allan):
By using the code below, you could test it by doing this in a "main()":
CTokenEx tok;
CString csSplit("one, two ,. three");
CStringArray splitIt;
CString m_deliminator = ",. ";
tok.Split(csSplit, m_deliminator, splitIt, FALSE);
printf("\n%s\n",csSplit);
for (int n=0; n<splitIt.GetSize(); n++)
{
printf("\"%s\"\n",splitIt.GetAt(n));
} This would produce:
one, two ,. three
"one"
"two"
"three"
void CTokenEx::Split(CString Source, CString Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
{
CString newCString = Source;
CString tmpCString = "";
CString AddCString = "";
int pos1 = 0;
int pos = 0;
AddIt.RemoveAll();
if (Deliminator.IsEmpty()) {
Deliminator = ",";
}
CString csStr = newCString;
int nDelCount = Deliminator.GetLength();
CString csMultDel = _T("");
for (int n=0; n<nDelCount; n++)
{
csMultDel = _T("");
csMultDel += Deliminator[n];
csStr.Replace(csMultDel,"~");
}
Deliminator = _T("~");
newCString = csStr;
do {
pos1 = 0;
pos = newCString.Find(Deliminator, pos1);
if ( pos != 1 ) {
CString AddCString = newCString.Left(pos);
if (!AddCString.IsEmpty()) {
AddIt.Add(AddCString);
}
else if (bAddEmpty) {
AddIt.Add(AddCString);
}
tmpCString = newCString.Mid(pos + Deliminator.GetLength());
newCString = tmpCString;
}
} while ( pos != 1 );
if ((!newCString.IsEmpty())  bAddEmpty) {
AddIt.Add(newCString);
}
}
Regards,
Dan





Works quite well. Nice job!
I would make another suggestion or two: separate the specification of the delimiters from the call to Split. There is more than a trivial amount of overhead to get the delimiters set up, and you might want to allow "reuse" of the same delimiters for a bunch of calls.
I was also wondering with the CString parameters if you want to pass by reference or value ... or whether it makes any difference ... you pass the CStringArray by reference, but not the two CString parameters.
void CTokenEx::Split(CString Source, CString Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
or
void CTokenEx::Split(CString& Source, CString& Deliminator, CStringArray& AddIt, BOOL bAddEmpty)
I tried making the change .... odd ... the Source parameter can be passed by reference, but not the Delimiter parameter. (but there is LOT about MFC that I don't understand )
I applied the following test cases to the revised code to only allow AZ and az, and it passes just fine. Sweet!
struct s_testTokenizer {
int expectedTokens;
char* pActualPattern;
char* pExpectedPattern;
};
struct s_testTokenizer testTokenizer[] = {
{ 1, "a", "1 <1 a>"},
{ 3, "a b c", "3 <1 a><1 b><1 c>"},
{ 3, " a b c ", "3 <1 a><1 b><1 c>"},
{ 3, "one two three", "3 <3 one><3 two><5 three>"},
{ 3, "one\ttwo\tthree", "3 <3 one><3 two><5 three>"},
{ 3, "one,two,,,, ,,, three,,,", "3 <3 one><3 two><5 three>"},
{ 3, " one two three", "3 <3 one><3 two><5 three>"},
{ 3, " one\ttwo\tthree", "3 <3 one><3 two><5 three>"},
{ 3, " one\ntwo\nthree", "3 <3 one><3 two><5 three>"},
{ 3, " one,two,,,, ,,, three,,,", "3 <3 one><3 two><5 three>"},
{ 3, " one two three ", "3 <3 one><3 two><5 three>"},
{ 3, " one\ttwo\tthree ", "3 <3 one><3 two><5 three>"},
{ 3, " one,two,,,, ,,, three,,, ", "3 <3 one><3 two><5 three>"},
{ 3, "one two three ", "3 <3 one><3 two><5 three>"},
{ 3, "one\ttwo\tthree", "3 <3 one><3 two><5 three>"},
{ 3, " one\ttwo\tthree ", "3 <3 one><3 two><5 three>"},
{ 3, " one\ttwo\tthree", "3 <3 one><3 two><5 three>"},
{ 3, "one\ttwo\tthree ", "3 <3 one><3 two><5 three>"},
{ 3, "\tone\ttwo\tthree ", "3 <3 one><3 two><5 three>"},
{ 3, "\tone\ttwo\tthree\t", "3 <3 one><3 two><5 three>"},
{ 3, "one,two,,,, ,,, three,,, ", "3 <3 one><3 two><5 three>"},
{ 3, " one \t two \t three ", "3 <3 one><3 two><5 three>"},
{ 0, "", "0 "},
{ 0, "1", "0 "},
{ 0, " \t ", "0 "},
{ 0, "123", "0 "},
{ 4, "a1b2c3d", "4 <1 a><1 b><1 c><1 d>"},
{ 4, " a1b2c3d ", "4 <1 a><1 b><1 c><1 d>"},
{ 4, " a1bb2c3d ", "4 <1 a><2 bb><1 c><1 d>"},
{ 4, "a1bb2c3d", "4 <1 a><2 bb><1 c><1 d>"},
{ 4, "a1bb2c3d ", "4 <1 a><2 bb><1 c><1 d>"},
{ 4, " a1bb2c3d", "4 <1 a><2 bb><1 c><1 d>"},
{ 1, " 12abc345 ", "1 <3 abc>"},
{ 1, " 12abc345", "1 <3 abc>"},
{ 1, "12abc345 ", "1 <3 abc>"},
{ 1, "12abc345", "1 <3 abc>"},
{ 2, "12abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg678", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " 12 abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, "12 abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, "12 abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, " 12 abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg ", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg ", "2 <3 abc><4 defg>"},
{ 2, "12abc345defg ", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg ", "2 <3 abc><4 defg>"},
{ 2, " 12abc345defg ", "2 <3 abc><4 defg>"},
{ 2, " 12 abc 345 defg ", "2 <3 abc><4 defg>"},
{ 2, "12 abc 345 defg ", "2 <3 abc><4 defg>"},
{ 2, "12 abc 345 defg ", "2 <3 abc><4 defg>"},
{ 2, " 12 abc 345 defg ", "2 <3 abc><4 defg>"},
{ 2, "abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " abc345defg678", "2 <3 abc><4 defg>"},
{ 2, " abc345defg678", "2 <3 abc><4 defg>"},
{ 2, "abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, "abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, "abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " abc345defg678 ", "2 <3 abc><4 defg>"},
{ 2, " abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, " abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, " abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, " abc 345 defg 678 ", "2 <3 abc><4 defg>"},
{ 2, " 00 11 2 3 4 5 6 77 88 99 aa bb ","2 <2 aa><2 bb>"},
{ 9, " aa bb c d e f g hh ii ", "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
{10, " aa bb c d e f g hh ii jj ", "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
{11, " aa bb c d e f g hh ii jj kk ", "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
{12, " aa bb c d e f g hh ii jj kk ll ","12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
{ 9, "aa bb c d e f g hh ii ", "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
{10, "aa bb c d e f g hh ii jj ", "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
{11, "aa bb c d e f g hh ii jj kk ", "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
{12, "aa bb c d e f g hh ii jj kk ll ", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
{ 9, " aa bb c d e f g hh ii", "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
{10, " aa bb c d e f g hh ii jj", "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
{11, " aa bb c d e f g hh ii jj kk", "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
{12, " aa bb c d e f g hh ii jj kk ll", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
{ 9, "aa bb c d e f g hh ii", "9 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii>"},
{10, "aa bb c d e f g hh ii jj", "10 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj>"},
{11, "aa bb c d e f g hh ii jj kk", "11 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk>"},
{12, "aa bb c d e f g hh ii jj kk ll", "12 <2 aa><2 bb><1 c><1 d><1 e><1 f><1 g><2 hh><2 ii><2 jj><2 kk><2 ll>"},
};
int testCount = sizeof(testTokenizer) / sizeof(testTokenizer[0]);
printf("TestCount: %d\n", testCount);
CString m_deliminator = " ,.0123456789;:_+=\n\t\r";
char actualTokenStrs[200];
char innerTokenStr[100];
for (int test = 0; test < testCount; ++test) {
CTokenEx tok;
char* pActualPattern = testTokenizer[test].pActualPattern;
char* pExpectedPattern = testTokenizer[test].pExpectedPattern;
CString csSplit(pActualPattern);
CStringArray splitIt;
tok.Split(csSplit, m_deliminator, splitIt, FALSE);
int size = splitIt.GetSize();
sprintf(actualTokenStrs, "%d ", size);
for (int iNum = 0; iNum < size; ++iNum) {
sprintf(innerTokenStr, "<%d %s>", splitIt[iNum].GetLength(), (LPCTSTR)(splitIt[iNum]));
strcat(actualTokenStrs, innerTokenStr);
}
if (strcmp(testTokenizer[test].pExpectedPattern, actualTokenStrs) != 0) {
printf("\nTokenizer problem: \nInput: [%s]\nExpect: [%s]\nActual: [%s]\n\n",
pActualPattern, pExpectedPattern, actualTokenStrs);
errorEncountered++;
}
else {
printf("OK: [%s] > [%s]\n", pActualPattern, actualTokenStrs);
}
}
if (errorEncountered == 0) {
printf("\nSuccess if this prints out\n");
}





In case you try to split a string with bAddEmpty set to true, if an empty token is at the end of the string, the split function won't add it. ex. something like "A" will result in only "A","" instead of "A","","". As a fix, modify the last 3 lines of the split function from if (!newCString.IsEmpty()) { // as long as the variable is not emty, add it AddIt.Add(newCString); } to if (!newCString.IsEmpty()  bAddEmpty) { // as long as the variable is not empty (or bAddEmpty is TRUE), add it AddIt.Add(newCString); }
Best Regards, Ahmed.





Hi Ahmed,
I will update the sources to include your fixes...Thanks for the input!
Thanks in advance,
Dan





You might like to check out my Standard Library string tokenizer, published in the April 1999 C/C++ User's Journal.
The source is downloadable from:
http://www.cuj.com/code/archive.html
Look for 'lorde.zip' in the download zip file.
Regards,
Dav





Thanks for the info...haven't read it yet but plan to!
Da







General News Suggestion Question Bug Answer Joke Praise Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

