 |
|
 |
comment poster ate angle brackets in the earlier version. Use this one instead: template<typename _Cont> void split(const string& str, _Cont& _container, const string& delim=",") { string::size_type lpos = 0; string::size_type pos = str.find_first_of(delim, lpos); while(lpos != string::npos) { _container.insert(_container.end(), str.substr(lpos,pos - lpos)); lpos = ( pos == string::npos ) ? string::npos : pos + 1; pos = str.find_first_of(delim, lpos); } }
|
|
|
|
 |
|
 |
This algorithm always returns any blank fields (a sensible default to be sure, but not always what one wants) and yes the caller could then choose to throw those empty values away.
It also assumes delim is a set of possible delimiters but the original poster uses std::string::find not find_first_of so the meaning is quite different.
Returning the size is helpful for repeated use when appending to the same container.
This allows the caller to write cleaner code when field counts are of interest.
Please eschew the use of underscores at the start of names, that's really for standard library implementers and compiler vendors etc., not we mere mortal dev types.
Keep it simple
dex
|
|
|
|
 |
 | bug  |  | _vin_ | 10:56 10 Aug '03 |
|
 |
When the delimeter is at the first place your function crashes.
|
|
|
|
 |
|
 |
or at last the place.
For example try to split " teststring" by " ", or split "teststring " by " ".
|
|
|
|
 |
|
 |
Agreed.
i've figured out the pb came from this line:
(1) int offset = positions[i-1] + sizeS2;
But changing it to:
(2) int offset = positions[i] + sizeS2;
only shifts the problem to the last word which is skipped.
My entry is:
onst string& input="mary|had|a||little|lamb|dsqdsqd|";
If i use (1), the program crashes with the error message "vector subscript out of range".
But if I use (2),I get the same message when reaching the last word "dsqdsqd".
Can someone explains or even better correct the portion of code to make it work?
|
|
|
|
 |
|
 |
Never mind. I have solved it.
The change is as follows:
int offset = positions[i] + sizeS2;
if( offset <= isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
|
|
|
|
 |
|
 |
No it's not - it still causes problems.
The solution is as follows:
if( i == 0 )
{
s = input.substr( i, positions[i] );
}
else
{
int offset = positions[i-1] + sizeS2;
if( offset < isize )
{
if( i == positions.size() )
{
s = input.substr(offset);
}
else if( i > 0 )
{
s = input.substr( positions[i-1] + sizeS2,
positions[i] - positions[i-1] - sizeS2 );
}
}
}
|
|
|
|
 |
|
 |
template
void split(const string& str, _Outit _Where, const string& delim=",")
{
string::size_type lpos = 0;
string::size_type pos = str.find_first_of(delim, lpos);
do
{
*_Where = str.substr(lpos,pos - lpos);
// front_inserter, back_inserter and inserter will do
// nothing with operator++
++_Where;
lpos = ( pos == string::npos ) ? string::npos : pos + 1;
pos = str.find_first_of(delim, lpos);
}
while(lpos != string::npos);
}
|
|
|
|
 |
|
 |
template<typename _Outit> void split(const string& str, _Outit _Where, const string& delim=",") { string::size_type lpos = 0; string::size_type pos = str.find_first_of(delim, lpos); do { *_Where = str.substr(lpos,pos - lpos); // front_inserter, back_inserter and inserter will do // nothing with operator++ ++_Where; lpos = ( pos == string::npos ) ? string::npos : pos + 1; pos = str.find_first_of(delim, lpos); } while(lpos != string::npos); }
|
|
|
|
 |
|
 |
This will support both 'back_inserter', 'front_inserter' and 'inserter'
|
|
|
|
 |
|
 |
Close but not quite!
Don't use do..while. Just use while. What if the incoming string contains no delimiters?...you will modify the destination. Using while() also means you need only invoke find_first_of() in one place.
|
|
|
|
 |
|
 |
getline (_Istr, _Str, _Delim)
_Istr - The input stream from which a string is to be extracted.
_Str - The string into which are read the characters from the input stream.
_Delim - The line delimiter.
|
|
|
|
 |
|
 |
I suggest to implement the out parameter results more generic:
template <typename outit>
static int SplitString(const string& input, const string& delimiter, outit results);
and change all results.push_back(s); to *(results++) = s; Then you are able to use any container you want with SplitString:vector<string> results1;
int num1 = StringUtils::SplitString(in, delim, std::back_inserter(results1));
list<string> results2;
int num2 = StringUtils::SplitString(in, delim, std::back_inserter(results2));
Regards
Thomas
Sonork id: 100.10453 Thömmi
Disclaimer: Because of heavy processing requirements, we are currently using some of your unused brain capacity for backup processing. Please ignore any hallucinations, voices or unusual dreams you may experience. Please avoid concentration-intensive tasks until further notice. Thank you.
|
|
|
|
 |
|
 |
Changing the function to a template is a good idea. However, I don't see the point in changing the push_back part. There are only 3 STL containers that are suitable for storing the results, vector, list and deque and all three of them have the push_back function. Any new containers developed by other developers are supposed to have push_back function also if they wish to be compatible with STL algorithms. So there is no need to change the push_back part to *(result++) which results in the awkward usage of back_inserter.
|
|
|
|
 |
|
 |
Well, I agree with Thomas, the modification makes sense
if you take container to a broader context. With this
modification, one can also use C/C++ Arrays as containers:
int v[10];
stringSplit("10,20,30,40", ",", v);
PS: For Thomas' modification to work you have to change
the code so that it returns the number of fields found,
because "results.size()" won't work anymore.
|
|
|
|
 |
|
 |
Hi,
How do u guys use the container map<> in C++, here's what I wanna do, I wanna count the number of occurrences of words in a sentence and increment the number associated to this word each time this word appears in the sentence, how do u do that with a map where string is the word and int the key associated to the word?
thx.
|
|
|
|
 |
|
 |
1. Define your variables just before you need them (not in the good old C manner )
2. Do you really need to define the string s, each time you enter the for loop ?
|
|
|
|
 |
|
 |
I added a string neutral version which does not depend on CString or std::string. The only functions it uses are strstr, memcpy and vector::push_back. The empties parameter will solve the adjacent delimiters problem previously mentioned. The default behavior will push back empty strings when delimiters are adjacent. Passing in false for the empties parameter will prevent empty strings from being in the results vector.
Paul J. Weiss
|
|
|
|
 |
|
 |
In a true split, it should return an empty string at the end on an expression such as "a-b-c-"
that should return { "a", "b", "c", "" } because of the delimiter on the end
I just updated my MFC compact split in the post below entitled "MFC compact split" to include that standard, I noticed that your previous versions did not work that way, the way perl splits.
I'm not sure about your new neutral code, I'll have to try it out later.
I don't get what people have against MFC either, oh well
MSIL sucks
|
|
|
|
 |
|
 |
I tried this splitter class, and was disappointed when I found that it could not do a perl type split of a string such as "a-b-c--d-e"
it could not deal with two consequtive delimiters
so I whipped this little MFC split function up to solve the problem. I like to call it "split"
###################### FUNCTION FUNC.H ####################
public:
static int split(const CString& delimiter, const CString& str, CStringArray& CStrArray);
###################### FUNCTION FUNC.CPP ####################
#include "func.h"
int func::split(const CString& delimiter, const CString& str, CStringArray& CStrArray)
{
CString strtmp = str;
while (strtmp.Find(delimiter) != -1) {
CStrArray.Add(strtmp.Mid(0, strtmp.Find(delimiter)));
if ((strtmp.Mid(0, strtmp.Find(delimiter)).GetLength() + delimiter.GetLength()) == strtmp.GetLength()) CStrArray.Add("");
CString n = strtmp; strtmp = n.Mid(n.Find(delimiter) + delimiter.GetLength(), n.GetLength()); }
if (strtmp.GetLength() > 0) { CStrArray.Add(strtmp); }
return CStrArray.GetSize();
}
###################### USAGE ####################
#include "func.h"
CStringArray arry;
func::split("-", "a-b-c", arry);
for (int i = 0; i < arry.GetSize(); i++) {
MessageBox(NULL, arry[i], NULL, MB_OK); }
MSIL sucks
|
|
|
|
 |
|
 |
That's great, for people who are stuck with MFC. A lot of us don't use it, which I guess is why the original author did not use it, either.
Christian
No offense, but I don't really want to encourage the creation of another VB developer. - Larry Antram 22 Oct 2002
Hey, at least Logo had, at it's inception, a mechanical turtle. VB has always lacked even that... - Shog9 04-09-2002
During last 10 years, with invention of VB and similar programming environments, every ill-educated moron became able to develop software. - Alex E. - 12-Sept-2002
|
|
|
|
 |
|
 |
The original author has an MFC version listed also, it's right above the feedback near the bottom of this page.
That's why I thought that my version may be helpful to viewers of this page.
I don't see MFC as anything less than a godsend. I was used to heavily using the WFC classes with Java, but now that MS has pretty much stated that it's phasing out java and the JVM, I've switched to C++ and MFC.
As Jeff Prosise says in his MS Press Book, Programming Windows with MFC, there are thousands of prewritten lines of code you can use to expidite development with MFC. I like it alot.
At any rate, I hope this code can help somebody out!
MSIL sucks
|
|
|
|
 |
|
 |
I wanted to split a string containing quotes using a , as the delimiter
std::string input = "\"\",blah,\"blah,blah\",blah,\",blah\",";
which should return
1 : ""
2 : blah
3 : "blah,blah"
4 : blah
5 : ",blah"
6 :
Here's what I did
int iPos = -1;
int newPos = -1;
int sizeS2 = delimiter.size();
int isize = input.size();
std::vector<int> positions;
if (isize == 0)
{
results.push_back(input);
return 0;
}
if (input[0] == '"')
{
newPos = input.find("\"", 1);
newPos = input.find(delimiter, newPos);
}
else
{
newPos = input.find(delimiter, 0);
}
if (newPos < 0)
{
results.push_back(input);
return 0;
}
while (newPos > iPos)
{
numFound++;
positions.push_back(newPos);
if (input[newPos+1] == '"')
{
newPos = input.find("\"", newPos+2);
}
iPos = newPos;
newPos = input.find(delimiter, iPos+sizeS2);
}
for (int i=0; i <= positions.size(); i++)
.
.
.
.
.
|
|
|
|
 |
|
 |
There are two solutions for the same sort of problem in the Boost libraries (http://www.boost.org). There's a Boost.Tokenizer library which pretty much covers the functionality here, and a Boost.Regex library that gives you a more powerful regular expression split routine.
William E. Kempf
|
|
|
|
 |
|
 |
Boost is great. But when working with the Unicode version of Regexp, there is a nasty bug. Actually, the bug is in VC++, not Boost, but because of this bug, Boost failes.
Try this:
wcout << L'A' << endl;
The output should be A. But in Visual C++, it is 65. VC++ treats wchar_t as unsigned char
I vote pro drink
|
|
|
|
 |