|
|
Comments and Discussions
|
|
 |

|
Quick version using vector as base class giving access to all vector's operators.
Does not 'trim' values.
class Split : public std::vector<std::string>
{
public:
Split(const std::string& str, char* delimList)
{
size_t lastPos = 0;
size_t pos = str.find_first_of(delimList);
while (pos != std::string::npos)
{
if (pos != lastPos)
push_back(str.substr(lastPos, pos-lastPos));
lastPos = pos + 1;
pos = str.find_first_of(delimList, lastPos);
}
if (lastPos < str.length())
push_back(str.substr(lastPos, pos-lastPos));
}
};
Example use to populate a stl set.
std::set<std::string> words;
Split split("hello,world", ",");
words.insert(split.begin(), split.end());
|
|
|
|

|
Std string version of the SplitString fails with the following parameters :
input : "1,2"
separator : ","
Tout programme dont la fiabilité dépend de l'homme n'est pas fiable
|
|
|
|

|
Just want clarification on the license for using this code. There's a generic remark about there being no specific license unless mentioned elsewhere, and in fact I don't see any mentioned, but I just want to be sure. Can this be treated like Public Domain, or are there more restrictions? I only ask because my company has a policy of not using source with no license (I guess for fear that a license can be applied later), and we'd really like to use this.
Thanks.
Tom
|
|
|
|

|
code even doesn't work! Author need a test run at least before put it on line!
See comments "Handling i-0 properly?"
|
|
|
|

|
Hello,
When a string like "Jane" is given and the delimeter " " cannot be found the results vector is empty. I would expect that the results vector will contain "Jane" in this case.
Greets Florian
|
|
|
|

|
Yeah, maybe the bug is here:
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
int offset = positions[i-1] + sizeS2;
If here only one emlement in positions, hence, positions[i-1] will throw exception.
Best wish,
Yun
|
|
|
|

|
May I suggest that you replace the results vector with an output iterator instead? That way, you don't have to write a new version of the function if you want the results in a list, set, output stream, or whatever.
--
Not a substitute for human interaction
|
|
|
|

|
Hi, I don't think your STL code is hangling i=0 properly. Look what happens at the following statement.
int offset = positions[i-1] + sizeS2;
Obviously you'll get a vector exception thrown. I assume you wanted an 'else' block right after your if(i==0) block. I put it in and it seems to work. Just posting so that someone else that simply copies and pastes won't go crazy wondering why it doesn't work
Mike
|
|
|
|

|
That's exactly what I did. I copied and pasted and it failed immediately. I suppose it's a good starting point for a string splitter, but the bug needs to be fixed.
Dom
|
|
|
|

|
I ditto this again.
What is up with putting code online that does not run?
|
|
|
|

|
In essence the neutral version allows 2 types of strings MFC and stl (I know that templates allow more, but who uses other strings?). But the container used is vector<>. I belive that for MFC CStringArray is a better choice.
Lev Elbert
|
|
|
|
|

|
Why are you collecting the positions first? There is no reason to do that. Its just wasting time. When you have a position and you have the next position you can store the result.
|
|
|
|

|
I updated the code to handle cases such as ";mary;had;;a;little;lamb;;;" where there could be empty strings in between the delimiters. I also added another argument to the function which is a boolean to include the empty strings as part of the resulting vector. If includeEmpties is false then only strings of size greater than zero will be included in the results. I updated the source and the demo project.
Enjoy
Paul J. Weiss
|
|
|
|

|
Hi, I tried to use this ...
with the string "foo;boo" I got always numcount eq 1 from SplitString ...
I modified SplitString and with this source it works quite well ...
call:
std::vector Colums;
int nCols = SplitString(LinkList, ";", Colums);
Please ignore std::
int SplitString(const std::string& input, const std::string& delimiter, std::vector& results)
{
int iPos = 0;
int newPos = -1;
int sizeS2 = delimiter.size();
int isize = input.size();
std::vector positions;
newPos = input.find (delimiter, 0);
if( newPos < 0 )
{
return 0;
}
int numFound = 0;
while( newPos > iPos )
{
//numFound++;
positions.push_back(newPos);
iPos = newPos;
newPos = input.find (delimiter, iPos + sizeS2 + 1);
}
for( int i=0; i <= positions.size(); i++ )
{
std::string s;
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2, positions[i] - positions[i-1] - sizeS2 );
}
}
if( s.size() > 0 )
{
results.push_back(s);
numFound++;
}
}
return numFound;
}
Regards
Andy
|
|
|
|

|
In CSV files often happens that you get parse strings like
xx;yy;;zz
Where ;; means that this entry is empty.
The class cannot detect this kind of pattern
it returns
xx
yy
;zz
to modificate this behaviour the following changes has to be supplied:
while( newPos > iPos )
{
numFound++;
positions.push_back(newPos);
iPos = newPos;
// newPos = input.find (delimiter, iPos+sizeS2+1);
newPos = input.find(delimiter,iPos + 1);
}
for( int i=0; i <= positions.size(); i++ )
{
string s;
if( i == 0 ) { s = input.substr( i, positions[i] ); }
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2,
positions[i] - positions[i-1] - sizeS2 );
}
}
//if( s.size() > 0 )
//{
results.push_back(s);
//}
}
regards
karl-heinz
|
|
|
|

|
my solution handles the issues that came up (delimeter at front or back, delimeter repeated in the input string) Beer26's version almost got it, except that he was copying the input string over and over (albeit slighty shorter each time). mine doens't copy the [entire] string at all. it can split a list of words 2MB in size by "\r\n" in 0.3 seconds.. which is a lot faster than anything else on this page
void CyourMFCClassDlg::split(const CString& str, const CString& delimiter, CStringArray& CStrArray)
{
long start = 0,
delim = str.Find(delimiter),
delimLen = delimiter.GetLength(),
elemCnt = 1; // the ACTUAL number of items, there'll be at least 1
// counting the elements, setting the size and then filling the array
// is much faster than doing .Add for each new element
while (delim > -1)
{
elemCnt++;
start = delim + delimLen;
delim = str.Find(delimiter, start);
}
// manually going through and finding each delimiter again is faster than
// keeping track of the positions from the last loop.. because doing .Add over and over
// to the position array would be such a bottleneck
start = 0;
delim = str.Find(delimiter);
CStrArray.SetSize(elemCnt); // now we don't have to use .Add, saving tons of cpu cycles
elemCnt = -1;
while (delim > -1)
{
elemCnt++;
CStrArray[elemCnt] = str.Mid(start, delim-start);
start = delim + delimLen;
delim = str.Find(delimiter, start);
}
if (start < str.GetLength())
CStrArray[elemCnt+1] = str.Mid(start);
else
CStrArray[elemCnt+1] = "";
}
|
|
|
|

|
Exactly what I was looking for! I needed to include Afxtempl.h in order to get the MFC version up 'n 'running
|
|
|
|

|
This version templates the output container and assumes elements are to be added to the end.
template<typename _Cont>
void split(const string& str, _Cont& _container, const string& delim=",")
{
string::size_type lpos = 0;
string::size_type pos = str.find_first_of(delim, lpos);
while(lpos != string::npos)
{
_container.insert(_container.end(), str.substr(lpos,pos - lpos));
lpos = ( pos == string::npos ) ? string::npos : pos + 1;
pos = str.find_first_of(delim, lpos);
}
}
Alexis
http://weblog.smirnov.ca
|
|
|
|

|
comment poster ate angle brackets in the earlier version. Use this one instead: template<typename _Cont> void split(const string& str, _Cont& _container, const string& delim=",") { string::size_type lpos = 0; string::size_type pos = str.find_first_of(delim, lpos); while(lpos != string::npos) { _container.insert(_container.end(), str.substr(lpos,pos - lpos)); lpos = ( pos == string::npos ) ? string::npos : pos + 1; pos = str.find_first_of(delim, lpos); } }
|
|
|
|

|
This algorithm always returns any blank fields (a sensible default to be sure, but not always what one wants) and yes the caller could then choose to throw those empty values away.
It also assumes delim is a set of possible delimiters but the original poster uses std::string::find not find_first_of so the meaning is quite different.
Returning the size is helpful for repeated use when appending to the same container.
This allows the caller to write cleaner code when field counts are of interest.
Please eschew the use of underscores at the start of names, that's really for standard library implementers and compiler vendors etc., not we mere mortal dev types.
Keep it simple
dex
|
|
|
|

|
When the delimeter is at the first place your function crashes.
|
|
|
|

|
or at last the place.
For example try to split " teststring" by " ", or split "teststring " by " ".
|
|
|
|

|
Agreed.
i've figured out the pb came from this line:
(1) int offset = positions[i-1] + sizeS2;
But changing it to:
(2) int offset = positions[i] + sizeS2;
only shifts the problem to the last word which is skipped.
My entry is:
onst string& input="mary|had|a||little|lamb|dsqdsqd|";
If i use (1), the program crashes with the error message "vector subscript out of range".
But if I use (2),I get the same message when reaching the last word "dsqdsqd".
Can someone explains or even better correct the portion of code to make it work?
|
|
|
|

|
Never mind. I have solved it.
The change is as follows:
int offset = positions[i] + sizeS2;
if( offset <= isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
|
|
|
|

|
No it's not - it still causes problems.
The solution is as follows:
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
else
{
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2,
positions[i] - positions[i-1] - sizeS2 );
}
}
}
|
|
|
|

|
template
void split(const string& str, _Outit _Where, const string& delim=",")
{
string::size_type lpos = 0;
string::size_type pos = str.find_first_of(delim, lpos);
do
{
*_Where = str.substr(lpos,pos - lpos);
// front_inserter, back_inserter and inserter will do
// nothing with operator++
++_Where;
lpos = ( pos == string::npos ) ? string::npos : pos + 1;
pos = str.find_first_of(delim, lpos);
}
while(lpos != string::npos);
}
|
|
|
|

|
template<typename _Outit> void split(const string& str, _Outit _Where, const string& delim=",") { string::size_type lpos = 0; string::size_type pos = str.find_first_of(delim, lpos); do { *_Where = str.substr(lpos,pos - lpos); // front_inserter, back_inserter and inserter will do // nothing with operator++ ++_Where; lpos = ( pos == string::npos ) ? string::npos : pos + 1; pos = str.find_first_of(delim, lpos); } while(lpos != string::npos); }
|
|
|
|

|
This will support both 'back_inserter', 'front_inserter' and 'inserter'
|
|
|
|

|
Close but not quite!
Don't use do..while. Just use while. What if the incoming string contains no delimiters?...you will modify the destination. Using while() also means you need only invoke find_first_of() in one place.
|
|
|
|

|
getline (_Istr, _Str, _Delim)
_Istr - The input stream from which a string is to be extracted.
_Str - The string into which are read the characters from the input stream.
_Delim - The line delimiter.
|
|
|
|

|
I suggest to implement the out parameter results more generic:
template <typename outit>
static int SplitString(const string& input, const string& delimiter, outit results);
and change all results.push_back(s); to *(results++) = s; Then you are able to use any container you want with SplitString:vector<string> results1;
int num1 = StringUtils::SplitString(in, delim, std::back_inserter(results1));
list<string> results2;
int num2 = StringUtils::SplitString(in, delim, std::back_inserter(results2));
Regards
Thomas
Sonork id: 100.10453 Thömmi
Disclaimer: Because of heavy processing requirements, we are currently using some of your unused brain capacity for backup processing. Please ignore any hallucinations, voices or unusual dreams you may experience. Please avoid concentration-intensive tasks until further notice. Thank you.
|
|
|
|

|
Changing the function to a template is a good idea. However, I don't see the point in changing the push_back part. There are only 3 STL containers that are suitable for storing the results, vector, list and deque and all three of them have the push_back function. Any new containers developed by other developers are supposed to have push_back function also if they wish to be compatible with STL algorithms. So there is no need to change the push_back part to *(result++) which results in the awkward usage of back_inserter.
|
|
|
|

|
Well, I agree with Thomas, the modification makes sense
if you take container to a broader context. With this
modification, one can also use C/C++ Arrays as containers:
int v[10];
stringSplit("10,20,30,40", ",", v);
PS: For Thomas' modification to work you have to change
the code so that it returns the number of fields found,
because "results.size()" won't work anymore.
|
|
|
|

|
Hi,
How do u guys use the container map<> in C++, here's what I wanna do, I wanna count the number of occurrences of words in a sentence and increment the number associated to this word each time this word appears in the sentence, how do u do that with a map where string is the word and int the key associated to the word?
thx.
|
|
|
|

|
1. Define your variables just before you need them (not in the good old C manner )
2. Do you really need to define the string s, each time you enter the for loop ?
|
|
|
|

|
I added a string neutral version which does not depend on CString or std::string. The only functions it uses are strstr, memcpy and vector::push_back. The empties parameter will solve the adjacent delimiters problem previously mentioned. The default behavior will push back empty strings when delimiters are adjacent. Passing in false for the empties parameter will prevent empty strings from being in the results vector.
Paul J. Weiss
|
|
|
|

|
In a true split, it should return an empty string at the end on an expression such as "a-b-c-"
that should return { "a", "b", "c", "" } because of the delimiter on the end
I just updated my MFC compact split in the post below entitled "MFC compact split" to include that standard, I noticed that your previous versions did not work that way, the way perl splits.
I'm not sure about your new neutral code, I'll have to try it out later.
I don't get what people have against MFC either, oh well
MSIL sucks
|
|
|
|

|
I tried this splitter class, and was disappointed when I found that it could not do a perl type split of a string such as "a-b-c--d-e"
it could not deal with two consequtive delimiters
so I whipped this little MFC split function up to solve the problem. I like to call it "split"
###################### FUNCTION FUNC.H ####################
public:
static int split(const CString& delimiter, const CString& str, CStringArray& CStrArray);
###################### FUNCTION FUNC.CPP ####################
#include "func.h"
int func::split(const CString& delimiter, const CString& str, CStringArray& CStrArray)
{
CString strtmp = str;
while (strtmp.Find(delimiter) != -1) {
CStrArray.Add(strtmp.Mid(0, strtmp.Find(delimiter)));
if ((strtmp.Mid(0, strtmp.Find(delimiter)).GetLength() + delimiter.GetLength()) == strtmp.GetLength()) CStrArray.Add("");
CString n = strtmp; strtmp = n.Mid(n.Find(delimiter) + delimiter.GetLength(), n.GetLength()); }
if (strtmp.GetLength() > 0) { CStrArray.Add(strtmp); }
return CStrArray.GetSize();
}
###################### USAGE ####################
#include "func.h"
CStringArray arry;
func::split("-", "a-b-c", arry);
for (int i = 0; i < arry.GetSize(); i++) {
MessageBox(NULL, arry[i], NULL, MB_OK); }
MSIL sucks
|
|
|
|

|
That's great, for people who are stuck with MFC. A lot of us don't use it, which I guess is why the original author did not use it, either.
Christian
No offense, but I don't really want to encourage the creation of another VB developer. - Larry Antram 22 Oct 2002
Hey, at least Logo had, at it's inception, a mechanical turtle. VB has always lacked even that... - Shog9 04-09-2002
During last 10 years, with invention of VB and similar programming environments, every ill-educated moron became able to develop software. - Alex E. - 12-Sept-2002
|
|
|
|

|
The original author has an MFC version listed also, it's right above the feedback near the bottom of this page.
That's why I thought that my version may be helpful to viewers of this page.
I don't see MFC as anything less than a godsend. I was used to heavily using the WFC classes with Java, but now that MS has pretty much stated that it's phasing out java and the JVM, I've switched to C++ and MFC.
As Jeff Prosise says in his MS Press Book, Programming Windows with MFC, there are thousands of prewritten lines of code you can use to expidite development with MFC. I like it alot.
At any rate, I hope this code can help somebody out!
MSIL sucks
|
|
|
|

|
I wanted to split a string containing quotes using a , as the delimiter
std::string input = "\"\",blah,\"blah,blah\",blah,\",blah\",";
which should return
1 : ""
2 : blah
3 : "blah,blah"
4 : blah
5 : ",blah"
6 :
Here's what I did
int iPos = -1;
int newPos = -1;
int sizeS2 = delimiter.size();
int isize = input.size();
std::vector<int> positions;
if (isize == 0)
{
results.push_back(input);
return 0;
}
if (input[0] == '"')
{
newPos = input.find("\"", 1);
newPos = input.find(delimiter, newPos);
}
else
{
newPos = input.find(delimiter, 0);
}
if (newPos < 0)
{
results.push_back(input);
return 0;
}
while (newPos > iPos)
{
numFound++;
positions.push_back(newPos);
if (input[newPos+1] == '"')
{
newPos = input.find("\"", newPos+2);
}
iPos = newPos;
newPos = input.find(delimiter, iPos+sizeS2);
}
for (int i=0; i <= positions.size(); i++)
.
.
.
.
.
|
|
|
|

|
There are two solutions for the same sort of problem in the Boost libraries (http://www.boost.org). There's a Boost.Tokenizer library which pretty much covers the functionality here, and a Boost.Regex library that gives you a more powerful regular expression split routine.
William E. Kempf
|
|
|
|
|

|
Hi all
If a string's first character is the delimiter, SplitString returns 0. i "solved" this by setting int iPos = -1 on the first line of SplitString..
Thanks for getting me going Paul!
|
|
|
|

|
Header: <SPAN class=cpp-comment><SPAN class=cpp-preprocessor>#include < vector ></SPAN>
<SPAN class=cpp-preprocessor>#include < string ></SPAN>
typedef std::string StringC;
typedef std::vector<std::string> StringSeqC;
int SplitString(const StringC & input,
const StringC & delimiter,
StringSeqC & results)
{
StringC part;
StringC::size_type seppos = 0;
StringC::size_type old_seppos = 0;
while (seppos != StringC::npos)
{
seppos = input.find(delimiter, old_seppos);
part = input.substr(old_seppos, seppos - old_seppos);
if(!part.empty())
results.push_back(part);
old_seppos = seppos + 1;
}
return results.size();
}
Here is another shot at splitting strings. It works fine when the input string has more than one delimiter like the example below :
<SPAN class=cpp-preprocessor>#pragma warning( disable : 4786)</SPAN>
StringC input = <SPAN class=cpp-keyword>"Splitting strings can be very easy "</SPAN>;
StringC delimiter = <SPAN class=cpp-keyword>" "</SPAN>;</SPAN>
StringSeq result;
SplitString(input, delimiter, result)
<SPAN class=cpp-comment>
Sincerely Coding
/PJ
|
|
|
|

|
I made a template vertion of the function to so that I could switch betwen string and wstring.
template < class TStringType >
int SplitString ( const TStringType & input, const TStringType & delimiter, vector < TStringType > & results )
{
int iPos = 0;
int newPos = -1;
int sizeS2 = delimiter.size();
int isize = input.size();
vector < int > positions;
newPos = input.find (delimiter, 0);
if( newPos < 0 ) { return 0; }
int numFound = 0;
while( newPos > iPos )
{
numFound++;
positions.push_back(newPos);
iPos = newPos;
newPos = input.find (delimiter, iPos+sizeS2+1);
}
for( int i=0; i <= positions.size(); i++ )
{
TStringType s;
if( i == 0 ) { s = input.substr( i, positions[i] ); }
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2, positions[i] - positions[i-1] - sizeS2 );
}
}
if( s.size() > 0 )
{
results.push_back(s);
}
}
return numFound;
}
thats it.
Have a good one,
-Ben
"Its funny when you stop doing things not because they’re wrong, but because you might get caught." - Unknown
|
|
|
|

|
You don't seem to quite understand namespaces.
There seems little reason for making StringUtils a class, other than StringSplit grouped with other function like it, which is precisely the purpose of a namespace.
Further, injecting the ENTIRE std library into the global namespace in a UTILITY function HEADER file is a unpardonable sin.
StringUtils.h should be written as:
#ifndef __STRINGUTILS_H_
#define __STRINGUTILS_H_
#include <string>
#include <vector>
namespace StringUtils
{
using std::string;
using std::vector;
int SplitString(const string& input, const string& delimiter, vector<string> & results);
};
#endif
Truth,
James
|
|
|
|

|
> StringUtils.h should be written as:
> #ifndef __STRINGUTILS_H_
> #define __STRINGUTILS_H_
I disagree! Names containing a double underscore (or beginning with an underscore followed by a letter) are reserved for use by the compiler vendor (see subclause 17.4.3.1 of ISO C++).
- Jim
|
|
|
|

|
How does 'vector results;' compile without template params? e.g. vector
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
A function that will split an input string based on a string delimiter.
| Type | Article |
| Licence | |
| First Posted | 14 May 2001 |
| Views | 220,969 |
| Bookmarked | 38 times |
|
|