Click here to Skip to main content
Licence CPOL
First Posted 28 Mar 2010
Views 8,678
Downloads 31
Bookmarked 24 times

Rive

By | 28 Mar 2010 | Article
An improved string split method.

Introduction

This is an Extension Method which will split a string into substrings much like String.Split only better -- when requested, it will not split on delimiters within quotes or ones that have been "escaped".

Background

Often when I have to split a string -- from a CSV file or a command line perhaps -- I can't use Split because the values may contain the delimiter character. Many years ago, I wrote a string splitter function in C and later ported it to C#, but I haven't been very happy with it. This week I decided to begin afresh and write a new version. This version doesn't have all the features of the old one, but it is easier to read and has more flexibility than Split does.

Option enumeration

As with Split, an enumeration controls which features to use during the split operation; however, Rive supports more options -- specifically the ability to ignore delimiters within quotes. I also threw in the ability to escape characters so they won't be treated as delimiters or quotes.

/**
<summary>
    Options for use with Rive.
</summary>
*/
[System.FlagsAttribute()]
public enum Option
{
    /**
    <summary>
        No options.
    </summary>
    */
    None = 0
,
    /**
    <summary>
        Do not include empty substrings.
    </summary>
    */
    RemoveEmptyEntries = 1
,
    /**
    <summary>
        Treat a special character following a backslash (\) as a regular character.
    </summary>
    */
    HonorEscapes = 2
,
    /**
    <summary>
        Do not split on delimiters within quotes (").
    </summary>
    */
    HonorQuotes = 4
,
    /**
    <summary>
        Do not split on delimiters within apostrophes (').
    </summary>
    */
    HonorApostrophes = 8
}

Rive

The public Rive methods (there are overloads, so the calling code needn't specify every parameter) are just front-ends to the DoRive method.

public static System.Collections.Generic.IList<string>
Rive
(
    this string   Subject
,
    int           Count
,
    Option        Options
,
    params char[] Delimiters
)
{
    if ( Subject == null )
    {
        throw ( new System.ArgumentNullException
            ( "Subject" , "Subject must not be null" ) ) ;
    }

    if ( Count < 0 )
    {
        throw ( new System.ArgumentOutOfRangeException
            ( "Count" , "Count must not be negative" ) ) ;
    }

    return ( DoRive ( Subject , Count , Options , Delimiters ) ) ;
}

DoRive

DoRive behaves much like Split except that it returns an IList<string> rather than a string[], and has additional features.

  • The default delimiters are as documented for String.Split.
  • If Count is zero (0), then an empty collection is returned.
  • If Count is one (1), then the original string is returned unchanged.
  • Otherwise, iterate the string, checking for delimiters and other characters as requested.
  • If Count-1 substrings have been produced, then the rest of the string becomes the final substring.

The additional features are straight-forward:

  • If HonorEscapes is specified and a backslash (\) is encountered, then the following character is copied intact.
  • If HonorQuotes is specified and a quote (") is encountered, then the characters up to the next quote are copied intact.
  • If HonorApostrophes is specified and an apostrophe (') is encountered, then the characters up to the next apostrophe are copied intact.
  • Backslashes, Quotes, and Apostrophes may be escaped.
private static System.Collections.Generic.IList<string>
DoRive
(
    string Subject
,
    int    Count
,
    Option Options
,
    char[] Delimiters
)
{
    System.Collections.Generic.List<string> result =
        new System.Collections.Generic.List<string>() ;

    if ( Count > 1 )
    {
        System.Text.StringBuilder temp =
            new System.Text.StringBuilder() ;

        System.Collections.Generic.HashSet<char> delims =
            new System.Collections.Generic.HashSet<char>() ;

        if ( Delimiters != null )
        {
            delims.UnionWith ( Delimiters ) ;
        }

        if ( delims.Count == 0 )
        {
            delims.UnionWith ( defaultdelimiters ) ;
        }

        bool remove = ( Options & Option.RemoveEmptyEntries ) == Option.RemoveEmptyEntries ;
        bool escape = ( Options & Option.HonorEscapes       ) == Option.HonorEscapes       ;
        bool quote  = ( Options & Option.HonorQuotes        ) == Option.HonorQuotes        ;
        bool apos   = ( Options & Option.HonorApostrophes   ) == Option.HonorApostrophes   ;

        char ch  ;
        int  pos = 0 ;
        int  len = Subject.Length ;

        while ( pos < len )
        {
            ch = Subject [ pos++ ] ;

            if ( delims.Contains ( ch ) )
            {
                if ( ( temp.Length > 0 ) || !remove )
                {
                    result.Add ( temp.ToString() ) ;

                    temp.Length = 0 ;

                    if
                    (
                        ( result.Count == Count - 1 )
                    &&
                        ( pos < len )
                    )
                    {
                        temp.Append ( Subject.Substring ( pos ) ) ;

                        pos = len ;
                    }
                }
            }
            else
            {
                if ( escape && ( ch == '\\' ) && ( pos < len ) )
                {
                    temp.Append ( ch ) ;

                    ch = Subject [ pos++ ] ;
                }
                else if ( quote && ( ch == '\"' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\"' ) ) ;
                }
                else if ( apos && ( ch == '\'' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\'' ) ) ;
                }

                temp.Append ( ch ) ;
            }
        }

        if ( ( temp.Length > 0 ) || !remove )
        {
            result.Add ( temp.ToString() ) ;
        }
    }
    else if ( Count == 1 )
    {
        result.Add ( Subject ) ;
    }

    return ( result.AsReadOnly() ) ;
}

History

  • 2010-03-26: First submitted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

PIEBALDconsult

Software Developer (Senior)

United States United States

Member

BSCS 1992 Wentworth Institute of Technology
 
Originally from the Boston (MA) area. Lived in SoCal for a while. Now in the Phoenix (AZ) area.
 
OpenVMS enthusiast, ISO 8601 evangelist, photographer, opinionated SOB
 
---------------
 
"Typing is no substitute for thinking." -- R.W. Hamming
 
"I find it appalling that you can become a programmer with less training than it takes to become a plumber." -- Bjarne Stroustrup
 
ZagNut’s Law: Arrogance is inversely proportional to ability.
 
"Well blow me sideways with a plastic marionette. I've just learned something new - and if I could award you a 100 for that post I would. Way to go you keyboard lovegod you." -- Pete O'Hanlon
 
"linq'ish" sounds like "inept" in German -- Andreas Gieriet
 

"Things would be different if I ran the zoo." -- Dr. Seuss
 
"Wrong is evil, and it must be defeated." – Jeff Ello
 
"A good designer must rely on experience, on precise, logical thinking, and on pedantic exactness." -- Nigel Shaw
 

"Omit needless local variables." -- Strunk... had he taught programming
 
"DON'T BE LIBERAL IN WHAT YOU ACCEPT!"
 
"Software Engineers don't have Trophy Wives; they have Presentation Layers."
 
"We learn more from our mistakes than we do from getting it right the first time."
 
"I'm an old dog and I like old tricks."
 
"Sometimes the envelope pushes back and sometimes you get a really nasty paper cut."
 
"A method shall have one and only one return statement."
 
My first rule of debugging: "If you get a different error message, you're making progress."
 
My golden rule of database management: "Do not unto others' databases as you would not have done unto yours."
 
My general rule of software development: "Design should be top-down, but implementation should be bottom-up."
 
"Today's heresy is tomorrow's dogma."
or
"Today's dogma is yesterday's heresy."
 
"The registry is evil."
 
"Every tool is a hammer."

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
QuestionWhy returning list as ReadOnly PinmemberXmen W.K.15:55 26 Feb '12  
AnswerRe: Why returning list as ReadOnly PinmemberPIEBALDconsult2:06 27 Feb '12  
GeneralMy vote of 5 PinmemberBenjano7:14 6 Jan '11  
GeneralMy vote of 5 PinmemberRoger Wright18:47 15 Dec '10  
GeneralMy vote of 5 PinmemberThomas Krojer3:13 15 Dec '10  
GeneralIEnumerable PinmemberJonathan C Dickinson21:18 29 Mar '10  
GeneralRe: IEnumerable PinmvpPIEBALDconsult4:22 30 Mar '10  
QuestionYour own CSV interpreter? PinmemberPaul B.14:58 29 Mar '10  
AnswerRe: Your own CSV interpreter? PinmvpPIEBALDconsult15:07 29 Mar '10  
Generaltext bug PinmvpLuc Pattyn17:52 28 Mar '10  
GeneralRe: text bug PinmvpPIEBALDconsult19:10 28 Mar '10  
GeneralRe: text bug PinmvpLuc Pattyn19:29 28 Mar '10  
Generalaha PinmvpLuc Pattyn7:54 28 Mar '10  
GeneralProduces well riven strings Pinmembersam.hill5:52 28 Mar '10  
GeneralRe: Produces well riven strings PinmvpPIEBALDconsult19:09 28 Mar '10  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web01 | 2.5.120517.1 | Last Updated 28 Mar 2010
Article Copyright 2010 by PIEBALDconsult
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid