Click here to Skip to main content
Click here to Skip to main content

STL Split String

By , 1 Feb 2006
 

Description

Below is a function I created and have found extremely useful for splitting strings based on a particular delimiter. The implementation only requires STL which makes it easy to port to any OS that supports STL. The function is fairly lightweight although I haven't done extensive performance testing.

The delimiter can be n number of characters represented as a string. The parts of the string in between the delimiter are then put into a string vector. The class StringUtils contains one static function SplitString. The int returned is the number of delimiters found within the input string.

I used this utility mainly for parsing strings that were being passed across platform boundaries. Whether you are using raw sockets or middleware such as TIBCO® it is uncomplicated to pass string data. I found it more efficient to pass delimited string data verses repeated calls or messages. Another place I used this was in passing BSTRs back and forth between a Visual Basic client and an ATL COM DLL. It proved to be easier than passing a SAFEARRAY as an [in] or [out] parameter. This was also beneficial when I did not want the added overhead of MFC and hence could not use CString.

Implementation

The SplitString function uses the STL string functions find and substr to iterate through the input string. The hardest part was figuring out how to get the substring of the input string based on the offsets of the delimiter, not forgetting to take into account the length of the delimiter. Another hurdle was making sure not to call substr with an offset greater than the length of the input string.

Header

#ifndef __STRINGUTILS_H_
#define __STRINGUTILS_H_

#include <string>
#include <vector>

using namespace std;

class StringUtils
{

public:

    static int SplitString(const string& input, 
        const string& delimiter, vector<string>& results, 
        bool includeEmpties = true);

};

#endif

Source

int StringUtils::SplitString(const string& input, 
       const string& delimiter, vector<string>& results, 
       bool includeEmpties)
{
    int iPos = 0;
    int newPos = -1;
    int sizeS2 = (int)delimiter.size();
    int isize = (int)input.size();

    if( 
        ( isize == 0 )
        ||
        ( sizeS2 == 0 )
    )
    {
        return 0;
    }

    vector<int> positions;

    newPos = input.find (delimiter, 0);

    if( newPos < 0 )
    { 
        return 0; 
    }

    int numFound = 0;

    while( newPos >= iPos )
    {
        numFound++;
        positions.push_back(newPos);
        iPos = newPos;
        newPos = input.find (delimiter, iPos+sizeS2);
    }

    if( numFound == 0 )
    {
        return 0;
    }

    for( int i=0; i <= (int)positions.size(); ++i )
    {
        string s("");
        if( i == 0 ) 
        { 
            s = input.substr( i, positions[i] ); 
        }
        int offset = positions[i-1] + sizeS2;
        if( offset < isize )
        {
            if( i == positions.size() )
            {
                s = input.substr(offset);
            }
            else if( i > 0 )
            {
                s = input.substr( positions[i-1] + sizeS2, 
                      positions[i] - positions[i-1] - sizeS2 );
            }
        }
        if( includeEmpties || ( s.size() > 0 ) )
        {
            results.push_back(s);
        }
    }
    return numFound;
}

Output using demo project

main.exe "|mary|had|a||little|lamb||" "|"

int SplitString(
        const string& input,
        const string& delimiter,
        vector<string>& results,
        bool includeEmpties = true
)

-------------------------------------------------------
input           = |mary|had|a||little|lamb||
delimiter       = |
return value    = 8 // Number of delimiters found
results.size()  = 9
results[0]      = ''
results[1]      = 'mary'
results[2]      = 'had'
results[3]      = 'a'
results[4]      = ''
results[5]      = 'little'
results[6]      = 'lamb'
results[7]      = ''
results[8]      = ''

int SplitString(
        const string& input,
        const string& delimiter,
        vector<string>& results,
        bool includeEmpties = false
)

-------------------------------------------------------
input           = |mary|had|a||little|lamb||
delimiter       = |
return value    = 8 // Number of delimiters found
results.size()  = 5
results[0]      = 'mary'
results[1]      = 'had'
results[2]      = 'a'
results[3]      = 'little'
results[4]      = 'lamb'

MFC version

For those of you who absolutely cannot use STL and are committed to MFC I made a few minor changes to the above implementation. It uses CString instead of std::string and a CStringArray instead of a std::vector:

//------------------------
// SplitString in MFC
//------------------------
int StringUtils::SplitString(const CString& input, 
  const CString& delimiter, CStringArray& results)
{
  int iPos = 0;
  int newPos = -1;
  int sizeS2 = delimiter.GetLength();
  int isize = input.GetLength();

  CArray<INT, int> positions;

  newPos = input.Find (delimiter, 0);

  if( newPos < 0 ) { return 0; }

  int numFound = 0;

  while( newPos > iPos )
  {
    numFound++;
    positions.Add(newPos);
    iPos = newPos;
    newPos = input.Find (delimiter, iPos+sizeS2+1);
  }

  for( int i=0; i <= positions.GetSize(); i++ )
  {
    CString s;
    if( i == 0 )
      s = input.Mid( i, positions[i] );
    else
    {
      int offset = positions[i-1] + sizeS2;
      if( offset < isize )
      {
        if( i == positions.GetSize() )
          s = input.Mid(offset);
        else if( i > 0 )
          s = input.Mid( positions[i-1] + sizeS2, 
                 positions[i] - positions[i-1] - sizeS2 );
      }
    }
    if( s.GetLength() > 0 )
      results.Add(s);
  }
  return numFound;
}

String neutral version

I added this version in case you might need to use it with any type of string. The only requirement is the string class must have a constructor that takes a char*. The code only depends on the STL vector. I also added the option to not include empty strings in the results, which will occur if delimiters are adjacent:

//-----------------------------------------------------------
// StrT:    Type of string to be constructed
//          Must have char* ctor.
// str:     String to be parsed.
// delim:   Pointer to delimiter.
// results: Vector of StrT for strings between delimiter.
// empties: Include empty strings in the results. 
//-----------------------------------------------------------
template< typename StrT >
int split(const char* str, const char* delim, 
     vector<StrT>& results, bool empties = true)
{
  char* pstr = const_cast<char*>(str);
  char* r = NULL;
  r = strstr(pstr, delim);
  int dlen = strlen(delim);
  while( r != NULL )
  {
    char* cp = new char[(r-pstr)+1];
    memcpy(cp, pstr, (r-pstr));
    cp[(r-pstr)] = '\0';
    if( strlen(cp) > 0 || empties )
    {
      StrT s(cp);
      results.push_back(s);
    }
    delete[] cp;
    pstr = r + dlen;
    r = strstr(pstr, delim);
  }
  if( strlen(pstr) > 0 || empties )
  {
    results.push_back(StrT(pstr));
  }
  return results.size();
}

String neutral usage

// using CString
//------------------------------------------
int i = 0;
vector<CString> results;
split("a-b-c--d-e-", "-", results);
for( i=0; i < results.size(); ++i )
{
  cout << results[i].GetBuffer(0) << endl;
  results[i].ReleaseBuffer();
}

// using std::string
//------------------------------------------
vector<string> stdResults;
split("a-b-c--d-e-", "-", stdResults);
for( i=0; i < stdResults.size(); ++i )
{
  cout << stdResults[i].c_str() << endl;
}

// using std::string without empties
//------------------------------------------
stdResults.clear();
split("a-b-c--d-e-", "-", stdResults, false);
for( i=0; i < stdResults.size(); ++i )
{
  cout << stdResults[i].c_str() << endl;
}

Conclusion

Hope you find this as useful as I did. Feel free to let me know of any bugs or enhancements. Enjoy ;)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Paul J. Weiss
Web Developer
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionQuick split using vector as base classmemberDennis Lang21 Feb '12 - 10:22 
GeneralBug SplitString("1,2",",",....)memberrichard sancenot16 Dec '10 - 23:40 
QuestionLicense for StringUtils::SplitStringmemberTom Toft31 Mar '10 - 9:59 
GeneralMy vote of 2memberzagzzig_o16 Mar '10 - 15:52 
GeneralBug: When no delimiter is found nothing is returnedmemberFlorian Rittmeier10 May '07 - 8:47 
GeneralRe: Bug: When no delimiter is found nothing is returnedmemberekey14 May '07 - 22:13 
Generaloutput iteratormemberJoergen Sigvardsson28 Oct '06 - 5:31 
QuestionHandling i-0 properly?membermiker206916 Oct '06 - 21:20 
AnswerRe: Handling i-0 properly?memberdomini_harling26 Apr '07 - 11:11 
AnswerRe: Handling i-0 properly?membermikecline27 Mar '09 - 8:09 
I ditto this again.
 
What is up with putting code online that does not run?
GeneralTrully neutral versionmemberelbertlev8 Jun '06 - 9:39 
GeneralBoost alternativesmemberMattyT8 Feb '06 - 14:24 
GeneralNo reason for position arraymemberMartin Richter6 Feb '06 - 23:46 
GeneralUpdated versionmemberPaul J. Weiss2 Feb '06 - 13:49 
GeneralHmmm, may be I'm wrongmemberAndreas Tirok24 Jan '06 - 10:54 
GeneralSmall modifications for patterns like ;;memberkhrl20 Dec '05 - 3:45 
Generalthe final MFC versionmemberdis141124 Jul '05 - 13:31 
GeneralThnx!membermuff999 Jul '05 - 16:20 
GeneralYet another versionsussAlexis Smirnov18 Mar '05 - 12:20 
GeneralRe: Yet another versionmemberAlexis Smirnov18 Mar '05 - 12:26 
GeneralCaveat programmermemberDavid 'dex' Schwartz8 Dec '09 - 0:59 
Generalbugmember_vin_10 Aug '03 - 10:56 
GeneralRe: bugmember_vin_10 Aug '03 - 11:11 
GeneralRe: bugmemberhiso720 Sep '07 - 22:18 
GeneralRe: bugmemberhiso720 Sep '07 - 22:40 
GeneralRe: bugmemberSkyDiver10 Sep '08 - 21:44 
GeneralThis is my version!sussAnonymous4 Mar '03 - 1:41 
GeneralRe: This is my version!sussAnonymous4 Mar '03 - 1:42 
GeneralRe: This is my version!sussAnonymous4 Mar '03 - 1:47 
GeneralRe: This is my version!sussAnonymous21 Apr '03 - 12:06 
GeneralBut STL function 'getline' can be used in this sample, isn't it? (-)membervasily4 Nov '02 - 21:15 
GeneralMake out parameter more genericmemberThomas Freudenberg4 Nov '02 - 13:08 
GeneralRe: Make out parameter more genericmemberYap Chun Wei4 Nov '02 - 21:32 
GeneralRe: Make out parameter more genericsussAnonymous15 May '04 - 1:51 
Generalusing the container map in C++memberhiso721 Sep '07 - 2:26 
GeneralSome small optimizations :-)memberPit M.4 Nov '02 - 0:02 
GeneralString neutral versionmemberPaul J. Weiss2 Nov '02 - 11:52 
GeneralRe: String neutral versionmemberBeer263 Nov '02 - 16:10 
GeneralMFC compact splitmemberBeer2631 Oct '02 - 17:39 
GeneralRe: MFC compact splitmemberChristian Graus31 Oct '02 - 18:06 
GeneralRe: MFC compact splitmemberBeer2631 Oct '02 - 18:18 
Generalsplitting with ""'smemberTodd Smith24 Oct '01 - 7:30 
GeneralCheck out Boost solutionsmemberWilliam E. Kempf17 Oct '01 - 11:58 
GeneralRe: Check out Boost solutionsmemberNemanja Trifunovic17 Oct '01 - 12:07 
Generalsplitting a string that starts with the delimetermemberlein17 Sep '01 - 0:53 
GeneralAccepting more than one delimitermemberDiller23 Aug '01 - 20:51 
Generalminor changememberBen Burnett23 May '01 - 15:27 
GeneralNamespacesmemberJames Curran21 May '01 - 2:28 
GeneralRe: NamespacesmemberJim Barry28 May '01 - 14:04 
Generalvector without template paramsmemberDavid Scambler16 May '01 - 17:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 1 Feb 2006
Article Copyright 2001 by Paul J. Weiss
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid