Click here to Skip to main content
Click here to Skip to main content

STL Split String

By , 1 Feb 2006
 

Description

Below is a function I created and have found extremely useful for splitting strings based on a particular delimiter. The implementation only requires STL which makes it easy to port to any OS that supports STL. The function is fairly lightweight although I haven't done extensive performance testing.

The delimiter can be n number of characters represented as a string. The parts of the string in between the delimiter are then put into a string vector. The class StringUtils contains one static function SplitString. The int returned is the number of delimiters found within the input string.

I used this utility mainly for parsing strings that were being passed across platform boundaries. Whether you are using raw sockets or middleware such as TIBCO® it is uncomplicated to pass string data. I found it more efficient to pass delimited string data verses repeated calls or messages. Another place I used this was in passing BSTRs back and forth between a Visual Basic client and an ATL COM DLL. It proved to be easier than passing a SAFEARRAY as an [in] or [out] parameter. This was also beneficial when I did not want the added overhead of MFC and hence could not use CString.

Implementation

The SplitString function uses the STL string functions find and substr to iterate through the input string. The hardest part was figuring out how to get the substring of the input string based on the offsets of the delimiter, not forgetting to take into account the length of the delimiter. Another hurdle was making sure not to call substr with an offset greater than the length of the input string.

Header

#ifndef __STRINGUTILS_H_
#define __STRINGUTILS_H_

#include <string>
#include <vector>

using namespace std;

class StringUtils
{

public:

    static int SplitString(const string& input, 
        const string& delimiter, vector<string>& results, 
        bool includeEmpties = true);

};

#endif

Source

int StringUtils::SplitString(const string& input, 
       const string& delimiter, vector<string>& results, 
       bool includeEmpties)
{
    int iPos = 0;
    int newPos = -1;
    int sizeS2 = (int)delimiter.size();
    int isize = (int)input.size();

    if( 
        ( isize == 0 )
        ||
        ( sizeS2 == 0 )
    )
    {
        return 0;
    }

    vector<int> positions;

    newPos = input.find (delimiter, 0);

    if( newPos < 0 )
    { 
        return 0; 
    }

    int numFound = 0;

    while( newPos >= iPos )
    {
        numFound++;
        positions.push_back(newPos);
        iPos = newPos;
        newPos = input.find (delimiter, iPos+sizeS2);
    }

    if( numFound == 0 )
    {
        return 0;
    }

    for( int i=0; i <= (int)positions.size(); ++i )
    {
        string s("");
        if( i == 0 ) 
        { 
            s = input.substr( i, positions[i] ); 
        }
        int offset = positions[i-1] + sizeS2;
        if( offset < isize )
        {
            if( i == positions.size() )
            {
                s = input.substr(offset);
            }
            else if( i > 0 )
            {
                s = input.substr( positions[i-1] + sizeS2, 
                      positions[i] - positions[i-1] - sizeS2 );
            }
        }
        if( includeEmpties || ( s.size() > 0 ) )
        {
            results.push_back(s);
        }
    }
    return numFound;
}

Output using demo project

main.exe "|mary|had|a||little|lamb||" "|"

int SplitString(
        const string& input,
        const string& delimiter,
        vector<string>& results,
        bool includeEmpties = true
)

-------------------------------------------------------
input           = |mary|had|a||little|lamb||
delimiter       = |
return value    = 8 // Number of delimiters found
results.size()  = 9
results[0]      = ''
results[1]      = 'mary'
results[2]      = 'had'
results[3]      = 'a'
results[4]      = ''
results[5]      = 'little'
results[6]      = 'lamb'
results[7]      = ''
results[8]      = ''

int SplitString(
        const string& input,
        const string& delimiter,
        vector<string>& results,
        bool includeEmpties = false
)

-------------------------------------------------------
input           = |mary|had|a||little|lamb||
delimiter       = |
return value    = 8 // Number of delimiters found
results.size()  = 5
results[0]      = 'mary'
results[1]      = 'had'
results[2]      = 'a'
results[3]      = 'little'
results[4]      = 'lamb'

MFC version

For those of you who absolutely cannot use STL and are committed to MFC I made a few minor changes to the above implementation. It uses CString instead of std::string and a CStringArray instead of a std::vector:

//------------------------
// SplitString in MFC
//------------------------
int StringUtils::SplitString(const CString& input, 
  const CString& delimiter, CStringArray& results)
{
  int iPos = 0;
  int newPos = -1;
  int sizeS2 = delimiter.GetLength();
  int isize = input.GetLength();

  CArray<INT, int> positions;

  newPos = input.Find (delimiter, 0);

  if( newPos < 0 ) { return 0; }

  int numFound = 0;

  while( newPos > iPos )
  {
    numFound++;
    positions.Add(newPos);
    iPos = newPos;
    newPos = input.Find (delimiter, iPos+sizeS2+1);
  }

  for( int i=0; i <= positions.GetSize(); i++ )
  {
    CString s;
    if( i == 0 )
      s = input.Mid( i, positions[i] );
    else
    {
      int offset = positions[i-1] + sizeS2;
      if( offset < isize )
      {
        if( i == positions.GetSize() )
          s = input.Mid(offset);
        else if( i > 0 )
          s = input.Mid( positions[i-1] + sizeS2, 
                 positions[i] - positions[i-1] - sizeS2 );
      }
    }
    if( s.GetLength() > 0 )
      results.Add(s);
  }
  return numFound;
}

String neutral version

I added this version in case you might need to use it with any type of string. The only requirement is the string class must have a constructor that takes a char*. The code only depends on the STL vector. I also added the option to not include empty strings in the results, which will occur if delimiters are adjacent:

//-----------------------------------------------------------
// StrT:    Type of string to be constructed
//          Must have char* ctor.
// str:     String to be parsed.
// delim:   Pointer to delimiter.
// results: Vector of StrT for strings between delimiter.
// empties: Include empty strings in the results. 
//-----------------------------------------------------------
template< typename StrT >
int split(const char* str, const char* delim, 
     vector<StrT>& results, bool empties = true)
{
  char* pstr = const_cast<char*>(str);
  char* r = NULL;
  r = strstr(pstr, delim);
  int dlen = strlen(delim);
  while( r != NULL )
  {
    char* cp = new char[(r-pstr)+1];
    memcpy(cp, pstr, (r-pstr));
    cp[(r-pstr)] = '\0';
    if( strlen(cp) > 0 || empties )
    {
      StrT s(cp);
      results.push_back(s);
    }
    delete[] cp;
    pstr = r + dlen;
    r = strstr(pstr, delim);
  }
  if( strlen(pstr) > 0 || empties )
  {
    results.push_back(StrT(pstr));
  }
  return results.size();
}

String neutral usage

// using CString
//------------------------------------------
int i = 0;
vector<CString> results;
split("a-b-c--d-e-", "-", results);
for( i=0; i < results.size(); ++i )
{
  cout << results[i].GetBuffer(0) << endl;
  results[i].ReleaseBuffer();
}

// using std::string
//------------------------------------------
vector<string> stdResults;
split("a-b-c--d-e-", "-", stdResults);
for( i=0; i < stdResults.size(); ++i )
{
  cout << stdResults[i].c_str() << endl;
}

// using std::string without empties
//------------------------------------------
stdResults.clear();
split("a-b-c--d-e-", "-", stdResults, false);
for( i=0; i < stdResults.size(); ++i )
{
  cout << stdResults[i].c_str() << endl;
}

Conclusion

Hope you find this as useful as I did. Feel free to let me know of any bugs or enhancements. Enjoy ;)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Paul J. Weiss
Web Developer
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionQuick split using vector as base classmemberDennis Lang21 Feb '12 - 10:22 
Quick version using vector as base class giving access to all vector's operators.
Does not 'trim' values.
 
// Split string into parts.
class Split : public std::vector<std::string>
{
public:
    Split(const std::string& str, char* delimList)
    {
        size_t lastPos = 0;
        size_t pos = str.find_first_of(delimList);
 
        while (pos != std::string::npos)
        {
            if (pos != lastPos)
                push_back(str.substr(lastPos, pos-lastPos));
            lastPos = pos + 1;
            pos = str.find_first_of(delimList, lastPos);
        }
        if (lastPos < str.length())
            push_back(str.substr(lastPos, pos-lastPos));
    }
};
Example use to populate a stl set.
 
std::set<std::string> words;
Split split("hello,world", ",");
words.insert(split.begin(), split.end());

GeneralBug SplitString("1,2",",",....)memberrichard sancenot16 Dec '10 - 23:40 
Std string version of the SplitString fails with the following parameters :
input : "1,2"
separator : ","
Tout programme dont la fiabilité dépend de l'homme n'est pas fiable

QuestionLicense for StringUtils::SplitStringmemberTom Toft31 Mar '10 - 9:59 
Just want clarification on the license for using this code. There's a generic remark about there being no specific license unless mentioned elsewhere, and in fact I don't see any mentioned, but I just want to be sure. Can this be treated like Public Domain, or are there more restrictions? I only ask because my company has a policy of not using source with no license (I guess for fear that a license can be applied later), and we'd really like to use this.
 
Thanks.
 
Tom
GeneralMy vote of 2memberzagzzig_o16 Mar '10 - 15:52 
code even doesn't work! Author need a test run at least before put it on line!
See comments "Handling i-0 properly?"
GeneralBug: When no delimiter is found nothing is returnedmemberFlorian Rittmeier10 May '07 - 8:47 
Hello,
 
When a string like "Jane" is given and the delimeter " " cannot be found the results vector is empty. I would expect that the results vector will contain "Jane" in this case.
 
Greets Florian
GeneralRe: Bug: When no delimiter is found nothing is returnedmemberekey14 May '07 - 22:13 
Yeah, maybe the bug is here:
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
int offset = positions[i-1] + sizeS2;
If here only one emlement in positions, hence, positions[i-1] will throw exception.
 
Best wish,
Yun
Generaloutput iteratormemberJoergen Sigvardsson28 Oct '06 - 5:31 
May I suggest that you replace the results vector with an output iterator instead? That way, you don't have to write a new version of the function if you want the results in a list, set, output stream, or whatever.
 
--
Not a substitute for human interaction

QuestionHandling i-0 properly?membermiker206916 Oct '06 - 21:20 
Hi, I don't think your STL code is hangling i=0 properly. Look what happens at the following statement.
 
int offset = positions[i-1] + sizeS2;
 
Obviously you'll get a vector exception thrown. I assume you wanted an 'else' block right after your if(i==0) block. I put it in and it seems to work. Just posting so that someone else that simply copies and pastes won't go crazy wondering why it doesn't work Smile | :)
 

 
Mike

AnswerRe: Handling i-0 properly?memberdomini_harling26 Apr '07 - 11:11 

That's exactly what I did. I copied and pasted and it failed immediately. I suppose it's a good starting point for a string splitter, but the bug needs to be fixed. Smile | :)
 
Dom
AnswerRe: Handling i-0 properly?membermikecline27 Mar '09 - 8:09 
I ditto this again.
 
What is up with putting code online that does not run?
GeneralTrully neutral versionmemberelbertlev8 Jun '06 - 9:39 
In essence the neutral version allows 2 types of strings MFC and stl (I know that templates allow more, but who uses other strings?). But the container used is vector<>. I belive that for MFC CStringArray is a better choice.
 
Lev Elbert
GeneralBoost alternativesmemberMattyT8 Feb '06 - 14:24 
Nice article. Smile | :)
 
For completeness it's worthwhile noting that there are solutions to this problem in the Boost library.
 
The string_algo library has a section on splitting strings using a couple of templated methods. In particular, split() will do a similar job to your function.
 
Tokenizer may also be useful, allowing you to iterate over a split string based on a token.
 
I've used these functions in production code and can attest that they work very well... Smile | :)
GeneralNo reason for position arraymemberMartin Richter6 Feb '06 - 23:46 
Why are you collecting the positions first? There is no reason to do that. Its just wasting time. When you have a position and you have the next position you can store the result.

GeneralUpdated versionmemberPaul J. Weiss2 Feb '06 - 13:49 
I updated the code to handle cases such as ";mary;had;;a;little;lamb;;;" where there could be empty strings in between the delimiters. I also added another argument to the function which is a boolean to include the empty strings as part of the resulting vector. If includeEmpties is false then only strings of size greater than zero will be included in the results. I updated the source and the demo project.
 
Enjoy
Cool | :cool:
 
Paul J. Weiss
GeneralHmmm, may be I'm wrongmemberAndreas Tirok24 Jan '06 - 10:54 
Hi, I tried to use this ...
 
with the string "foo;boo" I got always numcount eq 1 from SplitString ...
I modified SplitString and with this source it works quite well ...
 
call:
 
std::vector Colums;
int nCols = SplitString(LinkList, ";", Colums);
 
Please ignore std:: Frown | :-(
 
int SplitString(const std::string& input, const std::string& delimiter, std::vector& results)
{
int iPos = 0;
int newPos = -1;
int sizeS2 = delimiter.size();
int isize = input.size();
 
std::vector positions;
 
newPos = input.find (delimiter, 0);
 
if( newPos < 0 )
{
return 0;
}
 
int numFound = 0;
 
while( newPos > iPos )
{
//numFound++;
positions.push_back(newPos);
iPos = newPos;
newPos = input.find (delimiter, iPos + sizeS2 + 1);
}
 
for( int i=0; i <= positions.size(); i++ )
{
std::string s;
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2, positions[i] - positions[i-1] - sizeS2 );
}
}
if( s.size() > 0 )
{
results.push_back(s);
numFound++;
}
}
return numFound;
}
 
Regards
 
Andy
GeneralSmall modifications for patterns like ;;memberkhrl20 Dec '05 - 3:45 
In CSV files often happens that you get parse strings like
xx;yy;;zz
Where ;; means that this entry is empty.
The class cannot detect this kind of pattern
it returns
xx
yy
;zz
 
to modificate this behaviour the following changes has to be supplied:
 
while( newPos > iPos )
{
numFound++;
positions.push_back(newPos);
iPos = newPos;
// newPos = input.find (delimiter, iPos+sizeS2+1);
newPos = input.find(delimiter,iPos + 1);
}
 
for( int i=0; i <= positions.size(); i++ )
{
string s;
if( i == 0 ) { s = input.substr( i, positions[i] ); }
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2,
positions[i] - positions[i-1] - sizeS2 );
}
}
//if( s.size() > 0 )
//{
results.push_back(s);
//}
}

regards
karl-heinz
Generalthe final MFC versionmemberdis141124 Jul '05 - 13:31 
my solution handles the issues that came up (delimeter at front or back, delimeter repeated in the input string) Beer26's version almost got it, except that he was copying the input string over and over (albeit slighty shorter each time). mine doens't copy the [entire] string at all. it can split a list of words 2MB in size by "\r\n" in 0.3 seconds.. which is a lot faster than anything else on this page Smile | :)
 
void CyourMFCClassDlg::split(const CString& str, const CString& delimiter, CStringArray& CStrArray)
{
long start = 0,
delim = str.Find(delimiter),
delimLen = delimiter.GetLength(),
elemCnt = 1; // the ACTUAL number of items, there'll be at least 1
 
// counting the elements, setting the size and then filling the array
// is much faster than doing .Add for each new element
 
while (delim > -1)
{
elemCnt++;
start = delim + delimLen;
delim = str.Find(delimiter, start);
}

// manually going through and finding each delimiter again is faster than
// keeping track of the positions from the last loop.. because doing .Add over and over
// to the position array would be such a bottleneck
 
start = 0;
delim = str.Find(delimiter);
CStrArray.SetSize(elemCnt); // now we don't have to use .Add, saving tons of cpu cycles
elemCnt = -1;
 
while (delim > -1)
{
elemCnt++;
CStrArray[elemCnt] = str.Mid(start, delim-start);
start = delim + delimLen;
delim = str.Find(delimiter, start);
}

if (start < str.GetLength())
CStrArray[elemCnt+1] = str.Mid(start);
else
CStrArray[elemCnt+1] = "";
}

GeneralThnx!membermuff999 Jul '05 - 16:20 
Exactly what I was looking for! I needed to include Afxtempl.h in order to get the MFC version up 'n 'running
GeneralYet another versionsussAlexis Smirnov18 Mar '05 - 12:20 
This version templates the output container and assumes elements are to be added to the end.
 
template<typename _Cont>
void split(const string& str, _Cont& _container, const string& delim=",")
{
    string::size_type lpos = 0;
    string::size_type pos = str.find_first_of(delim, lpos);
    while(lpos != string::npos)
    {
		_container.insert(_container.end(), str.substr(lpos,pos - lpos));
 
		lpos = ( pos == string::npos ) ?  string::npos : pos + 1;
        pos = str.find_first_of(delim, lpos);
    }
}
 
Alexis
 
http://weblog.smirnov.ca
GeneralRe: Yet another versionmemberAlexis Smirnov18 Mar '05 - 12:26 
comment poster ate angle brackets in the earlier version. Use this one instead:
 
template<typename _Cont>
void split(const string& str, _Cont& _container, const string& delim=",")
{
      string::size_type lpos = 0;
      string::size_type pos = str.find_first_of(delim, lpos);
      while(lpos != string::npos)
      {
          _container.insert(_container.end(), str.substr(lpos,pos - lpos));
 
          lpos = ( pos == string::npos ) ?   string::npos : pos + 1;
            pos = str.find_first_of(delim, lpos);
      }
}
GeneralCaveat programmermemberDavid 'dex' Schwartz8 Dec '09 - 0:59 
This algorithm always returns any blank fields (a sensible default to be sure, but not always what one wants) and yes the caller could then choose to throw those empty values away.
It also assumes delim is a set of possible delimiters but the original poster uses std::string::find not find_first_of so the meaning is quite different.
Returning the size is helpful for repeated use when appending to the same container.
This allows the caller to write cleaner code when field counts are of interest.
 
Please eschew the use of underscores at the start of names, that's really for standard library implementers and compiler vendors etc., not we mere mortal dev types.
 
Keep it simple
dex

Generalbugmember_vin_10 Aug '03 - 10:56 
When the delimeter is at the first place your function crashes.
GeneralRe: bugmember_vin_10 Aug '03 - 11:11 
or at last the place.
 
For example try to split " teststring" by " ", or split "teststring " by " ".
GeneralRe: bugmemberhiso720 Sep '07 - 22:18 
Agreed.
 
i've figured out the pb came from this line:
(1) int offset = positions[i-1] + sizeS2;
 
But changing it to:
(2) int offset = positions[i] + sizeS2;
 
only shifts the problem to the last word which is skipped.
 
My entry is:
onst string& input="mary|had|a||little|lamb|dsqdsqd|";
 
If i use (1), the program crashes with the error message "vector subscript out of range".
But if I use (2),I get the same message when reaching the last word "dsqdsqd".
 
Can someone explains or even better correct the portion of code to make it work?
GeneralRe: bugmemberhiso720 Sep '07 - 22:40 
Never mind. I have solved it.
 
The change is as follows:
 
int offset = positions[i] + sizeS2;
if( offset <= isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
GeneralRe: bugmemberSkyDiver10 Sep '08 - 21:44 
No it's not - it still causes problems.
 
The solution is as follows:
 
if( i == 0 ) 
{ 
	s = input.substr( i, positions[i] ); 
}
else
{
	int offset = positions[i-1] + sizeS2;
	if( offset < isize )
	{
		if( i == positions.size() )
		{
			s = input.substr(offset);
		}
		else if( i > 0 )
		{
			s = input.substr( positions[i-1] + sizeS2, 
				positions[i] - positions[i-1] - sizeS2 );
		}
	}
}

GeneralThis is my version!sussAnonymous4 Mar '03 - 1:41 
template
void split(const string& str, _Outit _Where, const string& delim=",")
{
string::size_type lpos = 0;
string::size_type pos = str.find_first_of(delim, lpos);
do
{
*_Where = str.substr(lpos,pos - lpos);
// front_inserter, back_inserter and inserter will do
// nothing with operator++
++_Where;
lpos = ( pos == string::npos ) ? string::npos : pos + 1;
pos = str.find_first_of(delim, lpos);
}
while(lpos != string::npos);
}

GeneralRe: This is my version!sussAnonymous4 Mar '03 - 1:42 
template<typename _Outit>
void split(const string& str, _Outit _Where, const string& delim=",")
{
     string::size_type lpos = 0;
     string::size_type pos = str.find_first_of(delim, lpos);
     do
     {
          *_Where = str.substr(lpos,pos - lpos);
          // front_inserter, back_inserter and inserter will do
          // nothing with operator++
          ++_Where;
          lpos = ( pos == string::npos ) ?   string::npos : pos + 1;
          pos = str.find_first_of(delim, lpos);
     }
     while(lpos != string::npos);
}

GeneralRe: This is my version!sussAnonymous4 Mar '03 - 1:47 
This will support both 'back_inserter', 'front_inserter' and 'inserter'
GeneralRe: This is my version!sussAnonymous21 Apr '03 - 12:06 
Close but not quite!
 
Don't use do..while. Just use while. What if the incoming string contains no delimiters?...you will modify the destination. Using while() also means you need only invoke find_first_of() in one place.
 

GeneralBut STL function 'getline' can be used in this sample, isn't it? (-)membervasily4 Nov '02 - 21:15 
getline (_Istr, _Str, _Delim)
_Istr - The input stream from which a string is to be extracted.
_Str - The string into which are read the characters from the input stream.
_Delim - The line delimiter.

GeneralMake out parameter more genericmemberThomas Freudenberg4 Nov '02 - 13:08 
I suggest to implement the out parameter results more generic:
template <typename outit>
static int SplitString(const string& input, const string& delimiter, outit results);
and change all
results.push_back(s);
to
*(results++) = s;
Then you are able to use any container you want with SplitString:
vector<string> results1;
int num1 = StringUtils::SplitString(in, delim, std::back_inserter(results1));
 
list<string> results2;
int num2 = StringUtils::SplitString(in, delim, std::back_inserter(results2));

 
Regards
Thomas
 
Sonork id: 100.10453 Thömmi

Disclaimer:
Because of heavy processing requirements, we are currently using some of your unused brain capacity for backup processing. Please ignore any hallucinations, voices or unusual dreams you may experience. Please avoid concentration-intensive tasks until further notice. Thank you.

GeneralRe: Make out parameter more genericmemberYap Chun Wei4 Nov '02 - 21:32 
Changing the function to a template is a good idea. However, I don't see the point in changing the push_back part. There are only 3 STL containers that are suitable for storing the results, vector, list and deque and all three of them have the push_back function. Any new containers developed by other developers are supposed to have push_back function also if they wish to be compatible with STL algorithms. So there is no need to change the push_back part to *(result++) which results in the awkward usage of back_inserter.
GeneralRe: Make out parameter more genericsussAnonymous15 May '04 - 1:51 
Well, I agree with Thomas, the modification makes sense
if you take container to a broader context. With this
modification, one can also use C/C++ Arrays as containers:
 

int v[10];
stringSplit("10,20,30,40", ",", v);

 
PS: For Thomas' modification to work you have to change
the code so that it returns the number of fields found,
because "results.size()" won't work anymore.
Generalusing the container map in C++memberhiso721 Sep '07 - 2:26 
Hi,
 
How do u guys use the container map<> in C++, here's what I wanna do, I wanna count the number of occurrences of words in a sentence and increment the number associated to this word each time this word appears in the sentence, how do u do that with a map where string is the word and int the key associated to the word?
 
thx.
GeneralSome small optimizations :-)memberPit M.4 Nov '02 - 0:02 
1. Define your variables just before you need them (not in the good old C manner Smile | :) )
2. Do you really need to define the string s, each time you enter the for loop ?
 

 

GeneralString neutral versionmemberPaul J. Weiss2 Nov '02 - 11:52 
I added a string neutral version which does not depend on CString or std::string. The only functions it uses are strstr, memcpy and vector::push_back. The empties parameter will solve the adjacent delimiters problem previously mentioned. The default behavior will push back empty strings when delimiters are adjacent. Passing in false for the empties parameter will prevent empty strings from being in the results vector.
 
Paul J. Weiss
GeneralRe: String neutral versionmemberBeer263 Nov '02 - 16:10 
In a true split, it should return an empty string at the end on an expression such as "a-b-c-"
 
that should return { "a", "b", "c", "" } because of the delimiter on the end
 
I just updated my MFC compact split in the post below entitled "MFC compact split" to include that standard, I noticed that your previous versions did not work that way, the way perl splits.
 
I'm not sure about your new neutral code, I'll have to try it out later.
 
I don't get what people have against MFC either, oh well
 
MSIL sucks
GeneralMFC compact splitmemberBeer2631 Oct '02 - 17:39 
Cool | :cool: I tried this splitter class, and was disappointed when I found that it could not do a perl type split of a string such as "a-b-c--d-e"
 
it could not deal with two consequtive delimiters Frown | :(
 
so I whipped this little MFC split function up to solve the problem. I like to call it "split"
 
###################### FUNCTION FUNC.H ####################
public:
static int split(const CString& delimiter, const CString& str, CStringArray& CStrArray);
 

###################### FUNCTION FUNC.CPP ####################
 
#include "func.h"
 
int func::split(const CString& delimiter, const CString& str, CStringArray& CStrArray)
{
CString strtmp = str;
while (strtmp.Find(delimiter) != -1) {
CStrArray.Add(strtmp.Mid(0, strtmp.Find(delimiter)));
if ((strtmp.Mid(0, strtmp.Find(delimiter)).GetLength() + delimiter.GetLength()) == strtmp.GetLength()) CStrArray.Add("");
CString n = strtmp; strtmp = n.Mid(n.Find(delimiter) + delimiter.GetLength(), n.GetLength()); }
if (strtmp.GetLength() > 0) { CStrArray.Add(strtmp); }
return CStrArray.GetSize();
}
 
###################### USAGE ####################
 
#include "func.h"
 
CStringArray arry;
func::split("-", "a-b-c", arry);
for (int i = 0; i < arry.GetSize(); i++) {
MessageBox(NULL, arry[i], NULL, MB_OK); }
 
MSIL sucks
GeneralRe: MFC compact splitmemberChristian Graus31 Oct '02 - 18:06 
That's great, for people who are stuck with MFC. A lot of us don't use it, which I guess is why the original author did not use it, either.

 
Christian
 
No offense, but I don't really want to encourage the creation of another VB developer. - Larry Antram 22 Oct 2002
 
Hey, at least Logo had, at it's inception, a mechanical turtle. VB has always lacked even that... - Shog9 04-09-2002
 
During last 10 years, with invention of VB and similar programming environments, every ill-educated moron became able to develop software. - Alex E. - 12-Sept-2002

GeneralRe: MFC compact splitmemberBeer2631 Oct '02 - 18:18 
The original author has an MFC version listed also, it's right above the feedback near the bottom of this page.
 
That's why I thought that my version may be helpful to viewers of this page.
 
I don't see MFC as anything less than a godsend. I was used to heavily using the WFC classes with Java, but now that MS has pretty much stated that it's phasing out java and the JVM, I've switched to C++ and MFC.
 
As Jeff Prosise says in his MS Press Book, Programming Windows with MFC, there are thousands of prewritten lines of code you can use to expidite development with MFC. I like it alot.
 
At any rate, I hope this code can help somebody out! Smile | :)
 
MSIL sucks
Generalsplitting with ""'smemberTodd Smith24 Oct '01 - 7:30 
I wanted to split a string containing quotes using a , as the delimiter
 
std::string input = "\"\",blah,\"blah,blah\",blah,\",blah\",";
 
which should return
 
1 : ""
2 : blah
3 : "blah,blah"
4 : blah
5 : ",blah"
6 :
 
Here's what I did
 
	int iPos = -1;
	int newPos = -1;
	int sizeS2 = delimiter.size();
	int isize = input.size();
 
	std::vector<int> positions;
 
	if (isize == 0)
	{
		results.push_back(input);
		return 0; 
	}
 
	if (input[0] == '"')
	{
		newPos = input.find("\"", 1);
		newPos = input.find(delimiter, newPos);
	}
	else
	{
		newPos = input.find(delimiter, 0);
	}
 
	if (newPos < 0)
	{ 
		results.push_back(input);
		return 0; 
	}
 
	while (newPos > iPos)
	{
		numFound++;
		positions.push_back(newPos);
 
		// if char immediately after delimiter is a " then search for 
		// next closing " and then continue search for delimiter
		if (input[newPos+1] == '"')
		{
			newPos = input.find("\"", newPos+2);
		}
 
		iPos = newPos;
		newPos = input.find(delimiter, iPos+sizeS2);
	}
 
	for (int i=0; i <= positions.size(); i++)
.
.
.
.
.

GeneralCheck out Boost solutionsmemberWilliam E. Kempf17 Oct '01 - 11:58 
There are two solutions for the same sort of problem in the Boost libraries (http://www.boost.org). There's a Boost.Tokenizer library which pretty much covers the functionality here, and a Boost.Regex library that gives you a more powerful regular expression split routine.
 
William E. Kempf
GeneralRe: Check out Boost solutionsmemberNemanja Trifunovic17 Oct '01 - 12:07 
Boost is great. But when working with the Unicode version of Regexp, there is a nasty bug. Actually, the bug is in VC++, not Boost, but because of this bug, Boost failes.
 
Try this:
 
wcout << L'A' << endl;
 
The output should be A. But in Visual C++, it is 65. VC++ treats wchar_t as unsigned char Dead | X|
 
I vote pro drink Beer | [beer]
Generalsplitting a string that starts with the delimetermemberlein17 Sep '01 - 0:53 
Hi all
If a string's first character is the delimiter, SplitString returns 0. i "solved" this by setting int iPos = -1 on the first line of SplitString..
 
Thanks for getting me going Paul!
 

GeneralAccepting more than one delimitermemberDiller23 Aug '01 - 20:51 

Header:

<SPAN class=cpp-comment>//Split input into parts according to the delimiter; put the parts into a sequence </SPAN>
<SPAN class=cpp-preprocessor>#include < vector ></SPAN>
<SPAN class=cpp-preprocessor>#include < string ></SPAN>
 
typedef std::string StringC;
typedef std::vector<std::string> StringSeqC;
 
int SplitString(const StringC & input, 
                const StringC & delimiter, 
                StringSeqC & results)
{
   StringC part;
   StringC::size_type seppos = 0;
   StringC::size_type old_seppos = 0;
   while (seppos != StringC::npos)
   {
      seppos = input.find(delimiter, old_seppos);
      part = input.substr(old_seppos, seppos - old_seppos);
      if(!part.empty())
         results.push_back(part);
      old_seppos = seppos + 1;    
   }
 
   return results.size();
}
 
 
Here is another shot at splitting strings. It works fine when the input string has more than one delimiter like the example below :
<SPAN class=cpp-preprocessor>#pragma warning( disable : 4786)</SPAN>
StringC input = <SPAN class=cpp-keyword>"Splitting strings  can     be very easy    "</SPAN>;
StringC delimiter = <SPAN class=cpp-keyword>" "</SPAN>;</SPAN>
StringSeq result;
SplitString(input, delimiter, result)
<SPAN class=cpp-comment>// result = {Splitting;strings;can;be;very;easy;} </SPAN>
 
Sincerely Coding
/PJ
Generalminor changememberBen Burnett23 May '01 - 15:27 
I made a template vertion of the function to so that I could switch betwen string and wstring.
 
template < class TStringType >
int SplitString ( const TStringType & input, const TStringType & delimiter, vector < TStringType > & results )
{
	int iPos = 0;
	int newPos = -1;
	int sizeS2 = delimiter.size();
	int isize = input.size();
 
	vector < int > positions;
 
	newPos = input.find (delimiter, 0);
 
	if( newPos < 0 ) { return 0; }
 
	int numFound = 0;
 
	while( newPos > iPos )
	{
		numFound++;
		positions.push_back(newPos);
		iPos = newPos;
		newPos = input.find (delimiter, iPos+sizeS2+1);
	}
 
	for( int i=0; i <= positions.size(); i++ )
	{
		TStringType s;
		if( i == 0 ) { s = input.substr( i, positions[i] ); }
		int offset = positions[i-1] + sizeS2;
		if( offset < isize )
		{
			if( i == positions.size() )
			{
				s = input.substr(offset);
			}
			else if( i > 0 )
			{
				s = input.substr( positions[i-1] + sizeS2, positions[i] - positions[i-1] - sizeS2 );
			}
		}
		if( s.size() > 0 )
		{
			results.push_back(s);
		}
	}
	return numFound;
}
 
thats it.
 
Have a good one,
-Ben
 
"Its funny when you stop doing things not because they’re wrong, but because you might get caught." - Unknown
GeneralNamespacesmemberJames Curran21 May '01 - 2:28 
You don't seem to quite understand namespaces.
 
There seems little reason for making StringUtils a class, other than StringSplit grouped with other function like it, which is precisely the purpose of a namespace.
 
Further, injecting the ENTIRE std library into the global namespace in a UTILITY function HEADER file is a unpardonable sin.
 
StringUtils.h should be written as:
#ifndef __STRINGUTILS_H_
#define __STRINGUTILS_H_
 
#include <string>
#include <vector>
 
namespace StringUtils
{
using std::string;
using std::vector;
int SplitString(const string& input, const string& delimiter, vector<string> & results);
};
 
#endif
 

Truth,
James
GeneralRe: NamespacesmemberJim Barry28 May '01 - 14:04 
> StringUtils.h should be written as:
> #ifndef __STRINGUTILS_H_
> #define __STRINGUTILS_H_
 
I disagree! Names containing a double underscore (or beginning with an underscore followed by a letter) are reserved for use by the compiler vendor (see subclause 17.4.3.1 of ISO C++).
 
- Jim
Generalvector without template paramsmemberDavid Scambler16 May '01 - 17:15 
How does 'vector results;' compile without template params? e.g. vector Confused | :confused:

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 1 Feb 2006
Article Copyright 2001 by Paul J. Weiss
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid