 |
|
 |
Using this code compiled with vs 2005 produces an exception when iterarator debugging is not disabled.
In order to avoid that, the line:
while(roPred(*it))
must be replaced with:
while((it != rostr.end()) && roPred(*it))
|
|
|
|
 |
|
 |
Thanks for providing CTokenizer.
In the description, there is the phrase:
"more than one string at the time"
I'm not clear how your algorithm works with more than one string, so I don't understand what you mean by that. Are you referring to multiple substrings within a string?
Your code requires multiple calls to CTokenizer::Next to get the tokens, so in that sense is similar to strtok.
|
|
|
|
 |
|
 |
He is referring to the limitation of the RTL strtok function that only allows one string at a time to be tokenized. Since it uses an internal static pointer to the target string as well as the associated state data (i.e. current position), trying to tokenize a second string will overwrite the first one. By instantiating one instance of CTokenizer for *each* string to be parsed, you can work with "more than one string at a time."
|
|
|
|
 |
|
 |
Thanks for providing this.
The download is only the files for the class itself, not a test program. I did manage to put together a console MFC program, but it would be VERY helpful to people for you to provide code that shows how to use the class, and provides at least a simple test that the class works.
If you do this, I would suggest that the download include a rigorous cppunit type console app, so that a person who was contemplating using your class would have confidence in it.
(But I would acknowledge that it is a simple class and quite easy to figure out and use. Nice job!)
Again, thanks for CTokenizer. It may be able to help me avoid "re-inventing the wheel."
|
|
|
|
 |
|
 |
Hello I have wrote a program and performed a memory leak test with glowcode and found a mem leakage at "cs = m_cs.Mid(nStartPos, m_nCurPos - nStartPos);" How is that possible?
bool CTokenizer::Next(CString& cs)
{
cs.Empty();
while(m_nCurPos < m_cs.GetLength() && m_delim[static_cast(m_cs[m_nCurPos])])
++m_nCurPos;
if(m_nCurPos >= m_cs.GetLength())
return false;
int nStartPos = m_nCurPos;
while(m_nCurPos < m_cs.GetLength() && !m_delim[static_cast(m_cs[m_nCurPos])])
++m_nCurPos;
cs = m_cs.Mid(nStartPos, m_nCurPos - nStartPos);
return true;
}
|
|
|
|
 |
|
 |
I think Glowcode is mistaken. (that, or CString::Mid() has a memory leak)
A complex system that does not work is invariably found to have evolved from a simpler system that worked just fine. - Murphy's Law of Computing
|
|
|
|
 |
|
 |
I love this tokenizer, but I have an issue I need to change your next function. I need to return Null strings. This is due to my string may have no value in it. So instead of adding a space after every delimiter before using your routine, how would I change the Next routine to get what I need.
|
|
|
|
 |
|
 |
Well, I haven't looked at this code in ages, so would you mind sending me a code sample that demonstrates your problem so I can try and fix/change CTokenizer?
If Java had true garbage collection, most programs would delete themselves upon execution - Robert Sewell
|
|
|
|
 |
|
 |
simple, lets say the string is
ab~~bc~de~~fg
and ~ is the delimiter, I need to know that there was a null between the two ~'s. In other words I need your next code below changed that it will return a null value.
bool CTokenizer::Next(CString& cs)
{
cs.Empty();
while(m_nCurPos < m_cs.GetLength() && m_delim[static_cast(m_cs[m_nCurPos])])
++m_nCurPos;
if(m_nCurPos >= m_cs.GetLength())
return false;
int nStartPos = m_nCurPos;
while(m_nCurPos < m_cs.GetLength() && !m_delim[static_cast(m_cs[m_nCurPos])])
++m_nCurPos;
cs = m_cs.Mid(nStartPos, m_nCurPos - nStartPos);
return true;
}
|
|
|
|
 |
|
 |
CTokenizer skips all the separator characters it finds. So, it will not work for your case. I did write a class CStringSplitter that does exactly what you need. I'll try to post an article soon. (I'll email you the source code in the mean time)
If Java had true garbage collection, most programs would delete themselves upon execution - Robert Sewell
|
|
|
|
 |
|
 |
That worked perfectly, thanks.
|
|
|
|
 |
|
 |
You're welcome. Always glad to help.
If Java had true garbage collection, most programs would delete themselves upon execution - Robert Sewell
|
|
|
|
 |
|
 |
First I would like to thank to you for this class. But this class causes my program to crash in unicode builds.
Well, I found the problem. It is an easy one in fact. You assumed that the character code for the delimiter is less than 256 (1 byte). However, this is not correct for non-English character sets in general. Well, here is my correction to your class:
bool CMATokenizer::Next (CString& cs)
{
cs.Empty();
while (m_nCurPos < m_cs.GetLength () && (m_delim.Find (m_cs [m_nCurPos]) != -1))
++ m_nCurPos;
if (m_nCurPos >= m_cs.GetLength ())
return false;
int nStartPos = m_nCurPos;
while (m_nCurPos < m_cs.GetLength () && (m_delim.Find (m_cs [m_nCurPos]) == -1))
++ m_nCurPos;
cs = m_cs.Mid (nStartPos, m_nCurPos - nStartPos);
}
This wont work as fast as your implementation, but it is UNICODE compatible...
Mustafa Demirhan
http://www.macroangel.com
Sonork ID 100.9935:zoltrix
They say I'm lazy but it takes all my time
|
|
|
|
 |
|
 |
Mustafa Demirhan wrote:
You assumed that the character code for the delimiter is less than 256 (1 byte). However, this is not correct for non-English character sets in general
This is correct. The code assumes only ANSI chars (1 byte) because that was a design constraint. I believe the code mentions the fact that UNICODE is not supported. Maybe your code could be conditionally compiled for UNICODE builds.
The nice thing about C++ is that only your friends can handle your private parts.
|
|
|
|
 |
 | Bug!  |  | Anonymous | 23:51 23 Sep '01 |
|
 |
When parsing strings that contains international letters (or characters in some range - don't know), then the tokenizer reckognizes it as a separator. Clearly a bug. What's worse is that this behavior differs from computer to computer. Does the regional settings have anything to do with it?
For instance, on one computer, when having | as a separator the following string fails to parse:
abcød|abcd
is tokenized to
abc
d
abcd
instead of
abød
abcd
Seems like both | and ø is treated as separators... Compiled on one computer, it works, but not on the other despite the fact that they are identical, except for regional settings.
Fix?
|
|
|
|
 |
|
 |
have the same bug - most characters at 127 and over fails, but not all. strange.
|
|
|
|
 |
|
 |
Anyone got roound this bug yet !!!
|
|
|
|
 |
|
 |
Is this a Unicode build? If it is, CTokenizer won't work at all, the algorithm used assumes 256 characters (1 bytes). If it's not a Unicode build it should work. I'll check it out.
Foot-and-Mouth disease is believed to be the first virus unable to spread through Microsoft Outlook
|
|
|
|
 |
|
 |
Just to confirm
this occurs in standard build.
|
|
|
|
 |
|
 |
OK. I'll check it out, but I'm pretty busy right now. I'll keep you informed.
Foot-and-Mouth disease is believed to be the first virus unable to spread through Microsoft Outlook
|
|
|
|
 |
|
 |
Found it. I'll be posting the updated source code soon.
Foot-and-Mouth disease is believed to be the first virus unable to spread through Microsoft Outlook
|
|
|
|
 |
|
 |
The updated source code can be found here: Updated CTokenizer
Foot-and-Mouth disease is believed to be the first virus unable to spread through Microsoft Outlook
|
|
|
|
 |
|
|
 |
|
 |
Just what I needed!
Good work!
Furor fit laesa saepius patientia
|
|
|
|
 |
|
 |
I can't seem to download the file for this article.
It seems that it can't be found on codeproject.com
If anyone has it, can they PLEASE send it to me, it would be extremely appreciated (email address : gmesystems@primus.com.au)
|
|
|
|
 |