Click here to Skip to main content
15,886,422 members
Please Sign up or sign in to vote.
3.00/5 (1 vote)
See more:
Hi, I have a string:

"CRUDE OIL, LIGHT SWEET - ICE FUTURES EUROPE" ,110517 ,2011-05-17,067411, "(CONTRACTS OF 1,000 BARRELS)"

I am looking for a regular expression which split the string at the commas but keeps the characters within the double quotes together.

Something like ...regex.split(str, ", [^\"])") won`t work.

Can anyone please help me out?

Best
Mho
Posted
Comments
Sergey Alexandrovich Kryukov 25-May-11 16:22pm    
Tag the language and platform!
--SA

I wrote an article for this, but it doesn't use Regex.

Persistent String Parser[^]
 
Share this answer
 
v2
Comments
mhogli 30-May-11 15:41pm    
Ok, but that don´t parse the typical csv-string 1,2,3,4, "5, 78" very well.
Try this Regex: ,(?!(?<=(?:^|,)\s*\x22(?:[^\x22]|\x22\x22|\\\x22)*,)(?:[^\x22]|\x22\x22|\\\x22)*\x22\s*(?:,|$))

C# example:
C#
string input = "\"CRUDE OIL, LIGHT SWEET - ICE FUTURES EUROPE\" ,110517 ,2011-05-17,067411, \"(CONTRACTS OF 1,000 BARRELS)\"";

string[] strings = System.Text.RegularExpressions.Regex.Split(input, @",(?!(?<=(?:^|,)\s*\x22(?:[^\x22]|\x22\x22|\\\x22)*,)(?:[^\x22]|\x22\x22|\\\x22)*\x22\s*(?:,|$))");


Output is:

"CRUDE OIL, LIGHT SWEET - ICE FUTURES EUROPE"
110517
2011-05-17
067411
"(CONTRACTS OF 1,000 BARRELS)"
 
Share this answer
 
Comments
Manfred Rudolf Bihy 26-May-11 7:57am    
Wow! Me's speechless :)
Take a 5, it's all I can give.
Kim Togo 26-May-11 8:23am    
Thanks Manfred :-)
mhogli 26-May-11 17:19pm    
Thats brilliant and it works! Thank you very much Kim, five points from Germany to Denmark
Kim Togo 27-May-11 10:11am    
Thanks mhogli, and you are welcome. If it has solved your problem, then please press "Accept Answer". This will help other CP members to find the right solution.
What type of language is it ?
Kim Togo 31-May-11 9:12am    
Hi mhogli
Did the solution solve your problem?
This is the case where Regular Expressions do not play well. You need a split functions by ',' and blank space. On .NET the best way is using string.Split, on other platforms something like that is still the best options.

I could give more details if you tagged your platform/language. Do it now!

[EDIT per discussion, below]
For example of solution for a similar problem, please see my article: Enumeration-based Command Line Utility[^].

In the code for this article, I had a problem to make a good simulator to test command line in tricky situation where quotation marks are used to pass command line argument containing blank spaces. .NET parses a raw single-string command line into the array of strings in some cunning way, splitting the line by blank spaces but preserving the blank spaced in quoted fragments of the command line. It can even split more or less reasonably if the user makes mistakes in balancing of the quotation marks. I need to simulate this in order to accelerate testing.

The section "6. Testing" of the article explains this problem and the code. Look at this section to find out where this algorithm is implemented in my demo/test code and locate this code. I analyzed the problem from different stand points and concluded that Regular expression would not be really helpful, so I ended up with direct string calculations.

Hope it can be useful.

—SA
 
Share this answer
 
v3
Comments
mhogli 26-May-11 3:20am    
Thank you for your support. I use VB.NETand Framework 3.5.

Why do you think regex is no option? The expression: "\sa[^ub].*?\s" for example find all words which have an "a" at the begin until the next blank space. But it will not find terms if is an "u" or "b" following the "a". Thats why I thought regex would be the best solution for splitting such strings.
Sergey Alexandrovich Kryukov 28-May-11 2:14am    
I think this is unnecessary complication. You devise more or less tricky Regex and then need to traverse all matches anyway. I solved the very similar task recently and considered different options, ended up using string manipulations. If you use C#, would give you the reference...
--SA
mhogli 30-May-11 3:09am    
Despite using vb.net I am very interested in your statement. How did you finally end up to solve this challenge with string manipulations?
Sergey Alexandrovich Kryukov 30-May-11 14:17pm    
Sure. See the updated solution and my article; find the source code the way I explained.
--SA
Espen Harlinn 30-May-11 19:00pm    
Good point - regex seems like an overkill, my 5
I've been playing around with Expresso in helping me with regex.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 25-May-11 16:25pm    
OK, and do you know the good pattern? No! This is not a task for Regex.
Please see my answer.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900