Click here to Skip to main content
15,887,966 members
Articles / Programming Languages / C++
Article

Simple string parsing in nested loops

Rate me:
Please Sign up or sign in to vote.
4.71/5 (19 votes)
14 Dec 2004CPOL2 min read 108.5K   855   19   5
Fast string parsing in nested loops.

Introduction

Parsing strings is a simple operation, and can be done using the C function strtok which is a function of the C run time library. It can help in finding string tokens in a fast and simple way like that:

#define DELIMITERS    " \r\n\t!@#$%^&*()_+-={}|\\:\"'?¿/.,<>’¡º×÷‘"

char string[] = "A string\tof ,,tokens\nand some  more tokens";
char* token = strtok(string, DELIMITERS);
while(token != NULL)
{    // While there are tokens in "string"
    // ...
    // doing some thing with token
    // ...
    // Get next token
    token = strtok(NULL, DELIMITERS);
}

Problems

But with this way or this function, you will face many problems like:

  1. You can't get the delimiter char that delimits this token, as the strtok function inserts '0' at token end, so the input string is modified.
  2. You can't use this function in nested loops as the function strtok is using a static variable to hold some passing information, as you can see in the help note:

    Note: Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings, and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.

  3. You can't parse strings for sequence of delimiters, like a delimiter that contains many characters, but they should appear in sequence.

Solution

So, I have built the class CStrTok to solve all of these problems, specially the second problem of the usage of a static variable; just encapsulate it in a class like this.

class CStrTok
{
public:
    CStrTok();
    ~CStrTok();
public:
    LPSTR m_lpszNext;
    char m_chDelimiter;
    // ... some attributes
public:
    LPSTR GetFirst(LPSTR lpsz, LPCSTR lpcszDelimiters);
    LPSTR GetNext(LPCSTR lpcszDelimiters);
    // ... some functions
};

The variable m_lpszNext is used to hold the next token to be parsed, and the variable m_chDelimiter is used to hold the delimiter that was ending the current token, to be returned after the next call of GetNext, so the class can be used in nested loops without any problems, as you can see:

CStrTok Usage

// code to parse tab delimited text files
CStrTok StrTok[3];
StrTok[0].m_bDelimitersInSequence = true; // for "\r\n"
// parse file buffer for rows and columns
char* pRow = StrTok[0].GetFirst(pFileBuffer, "\r\n");
while(pRow)
{
    // parse the row
    char* pCol = StrTok[1].GetFirst(pRow, "\t");
    while(pCol)
    {
        // parse the col
        char* pToken = StrTok[2].GetFirst(pCol, " ,;");
        while(pToken)
        {
            // ... using pToken
            pToken = StrTok[2].GetNext(" ,;");
        }
        // get next column
        pCol = StrTok[1].GetNext("\t");
    }
    // get next row
    pRow = StrTok[0].GetNext("\r\n");
}

I think you will find it so easy to use.

Source code files

StrTok.cpp, StrTok.h

Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Egypt Egypt

Comments and Discussions

 
GeneralString Parsing Pin
Anonymous11-Oct-05 22:28
Anonymous11-Oct-05 22:28 
GeneralClassic strtok method Pin
m0w8-Jun-05 22:24
m0w8-Jun-05 22:24 
GeneralVery Good Pin
Hing18-Mar-05 21:45
Hing18-Mar-05 21:45 
GeneralRe: Very Good Pin
michaelinp18-Nov-08 22:46
michaelinp18-Nov-08 22:46 
Generalstrtok() as lifted from the C standard lib Pin
mef52628-Dec-04 19:35
mef52628-Dec-04 19:35 
thanks for this article. I was faced with the same problem but approached the solution differently. In Windows this function uses TLS- Thread Local Storage- to keep the static variable. What I did was to port the code from tehh C lib source code that came with VC++ and make the static var a parameter in the function call. Here's the code:

/***
*strtok.c - tokenize a pcSearchStr with given delimiters
*
* Copyright (c) 1989-2001, Microsoft Corporation. All rights reserved.
*
*Purpose:
* defines strtok() - breaks pcSearchStr into series of token
* via repeated calls.
*
*******************************************************************************/

#if defined(__cplusplus) || defined(__cplusplus__)
extern "C" {
#endif

char * __cdecl mmStrtok
( char * pcSearchStr
, const char * pcDelim
, ViChar **ppcNextToken)
{
unsigned char *str;
const unsigned char *ctrl = (const unsigned char *)pcDelim;

unsigned char map[32];
int count;


/* Clear pcDelim map */
for (count = 0; count < 32; count++)
map[count] = 0;

/* Set bits in delimiter table */
do {
map[*ctrl >> 3] |= (1 << (*ctrl & 7));
} while (*ctrl++);

/* Initialize str. If pcSearchStr is NULL, set str to the saved
* pointer (i.e., continue breaking tokens out of the pcSearchStr
* from the last strtok call) */
if (pcSearchStr)
str = (unsigned char *)pcSearchStr;
else
str = (unsigned char *)*ppcNextToken;

/* Find beginning of token (skip over leading delimiters). Note that
* there is no token iff this loop sets str to point to the terminal
* null (*str == '\0') */
while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )
str++;

pcSearchStr = (char *)str;

/* Find the end of the token. If it is not the end of the pcSearchStr,
* put a null there. */
for ( ; *str ; str++ )
if ( map[*str >> 3] & (1 << (*str & 7)) ) {
*str++ = '\0';
break;
}

/* Update *ppcNextToken (or the corresponding field in the per-thread data
* structure */
*ppcNextToken = (char *)str;

/* Determine if a token has been found. */
if ( pcSearchStr == (char *)str )
return NULL;
else
return pcSearchStr;
} // mmStrtok()
#if defined(__cplusplus) || defined(__cplusplus__)
}
#endif

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.