Click here to Skip to main content
Click here to Skip to main content

Rive

, 28 Mar 2010 CPOL
Rate this:
Please Sign up or sign in to vote.
An improved string split method.

Introduction

This is an Extension Method which will split a string into substrings much like String.Split only better -- when requested, it will not split on delimiters within quotes or ones that have been "escaped".

Background

Often when I have to split a string -- from a CSV file or a command line perhaps -- I can't use Split because the values may contain the delimiter character. Many years ago, I wrote a string splitter function in C and later ported it to C#, but I haven't been very happy with it. This week I decided to begin afresh and write a new version. This version doesn't have all the features of the old one, but it is easier to read and has more flexibility than Split does.

Option enumeration

As with Split, an enumeration controls which features to use during the split operation; however, Rive supports more options -- specifically the ability to ignore delimiters within quotes. I also threw in the ability to escape characters so they won't be treated as delimiters or quotes.

/**
<summary>
    Options for use with Rive.
</summary>
*/
[System.FlagsAttribute()]
public enum Option
{
    /**
    <summary>
        No options.
    </summary>
    */
    None = 0
,
    /**
    <summary>
        Do not include empty substrings.
    </summary>
    */
    RemoveEmptyEntries = 1
,
    /**
    <summary>
        Treat a special character following a backslash (\) as a regular character.
    </summary>
    */
    HonorEscapes = 2
,
    /**
    <summary>
        Do not split on delimiters within quotes (").
    </summary>
    */
    HonorQuotes = 4
,
    /**
    <summary>
        Do not split on delimiters within apostrophes (').
    </summary>
    */
    HonorApostrophes = 8
}

Rive

The public Rive methods (there are overloads, so the calling code needn't specify every parameter) are just front-ends to the DoRive method.

public static System.Collections.Generic.IList<string>
Rive
(
    this string   Subject
,
    int           Count
,
    Option        Options
,
    params char[] Delimiters
)
{
    if ( Subject == null )
    {
        throw ( new System.ArgumentNullException
            ( "Subject" , "Subject must not be null" ) ) ;
    }

    if ( Count < 0 )
    {
        throw ( new System.ArgumentOutOfRangeException
            ( "Count" , "Count must not be negative" ) ) ;
    }

    return ( DoRive ( Subject , Count , Options , Delimiters ) ) ;
}

DoRive

DoRive behaves much like Split except that it returns an IList<string> rather than a string[], and has additional features.

  • The default delimiters are as documented for String.Split.
  • If Count is zero (0), then an empty collection is returned.
  • If Count is one (1), then the original string is returned unchanged.
  • Otherwise, iterate the string, checking for delimiters and other characters as requested.
  • If Count-1 substrings have been produced, then the rest of the string becomes the final substring.

The additional features are straight-forward:

  • If HonorEscapes is specified and a backslash (\) is encountered, then the following character is copied intact.
  • If HonorQuotes is specified and a quote (") is encountered, then the characters up to the next quote are copied intact.
  • If HonorApostrophes is specified and an apostrophe (') is encountered, then the characters up to the next apostrophe are copied intact.
  • Backslashes, Quotes, and Apostrophes may be escaped.
private static System.Collections.Generic.IList<string>
DoRive
(
    string Subject
,
    int    Count
,
    Option Options
,
    char[] Delimiters
)
{
    System.Collections.Generic.List<string> result =
        new System.Collections.Generic.List<string>() ;

    if ( Count > 1 )
    {
        System.Text.StringBuilder temp =
            new System.Text.StringBuilder() ;

        System.Collections.Generic.HashSet<char> delims =
            new System.Collections.Generic.HashSet<char>() ;

        if ( Delimiters != null )
        {
            delims.UnionWith ( Delimiters ) ;
        }

        if ( delims.Count == 0 )
        {
            delims.UnionWith ( defaultdelimiters ) ;
        }

        bool remove = ( Options & Option.RemoveEmptyEntries ) == Option.RemoveEmptyEntries ;
        bool escape = ( Options & Option.HonorEscapes       ) == Option.HonorEscapes       ;
        bool quote  = ( Options & Option.HonorQuotes        ) == Option.HonorQuotes        ;
        bool apos   = ( Options & Option.HonorApostrophes   ) == Option.HonorApostrophes   ;

        char ch  ;
        int  pos = 0 ;
        int  len = Subject.Length ;

        while ( pos < len )
        {
            ch = Subject [ pos++ ] ;

            if ( delims.Contains ( ch ) )
            {
                if ( ( temp.Length > 0 ) || !remove )
                {
                    result.Add ( temp.ToString() ) ;

                    temp.Length = 0 ;

                    if
                    (
                        ( result.Count == Count - 1 )
                    &&
                        ( pos < len )
                    )
                    {
                        temp.Append ( Subject.Substring ( pos ) ) ;

                        pos = len ;
                    }
                }
            }
            else
            {
                if ( escape && ( ch == '\\' ) && ( pos < len ) )
                {
                    temp.Append ( ch ) ;

                    ch = Subject [ pos++ ] ;
                }
                else if ( quote && ( ch == '\"' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\"' ) ) ;
                }
                else if ( apos && ( ch == '\'' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\'' ) ) ;
                }

                temp.Append ( ch ) ;
            }
        }

        if ( ( temp.Length > 0 ) || !remove )
        {
            result.Add ( temp.ToString() ) ;
        }
    }
    else if ( Count == 1 )
    {
        result.Add ( Subject ) ;
    }

    return ( result.AsReadOnly() ) ;
}

History

  • 2010-03-26: First submitted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

PIEBALDconsult
Software Developer (Senior)
United States United States
BSCS 1992 Wentworth Institute of Technology
 
Originally from the Boston (MA) area. Lived in SoCal for a while. Now in the Phoenix (AZ) area.
 
OpenVMS enthusiast, ISO 8601 evangelist, photographer, opinionated SOB
 
---------------
 
"If you need help knowing what to think, let me know and I'll tell you." -- Jeffrey Snover [MSFT]
 
"Typing is no substitute for thinking." -- R.W. Hamming
 
"I find it appalling that you can become a programmer with less training than it takes to become a plumber." -- Bjarne Stroustrup
 
ZagNut’s Law: Arrogance is inversely proportional to ability.
 
"Well blow me sideways with a plastic marionette. I've just learned something new - and if I could award you a 100 for that post I would. Way to go you keyboard lovegod you." -- Pete O'Hanlon
 
"linq'ish" sounds like "inept" in German -- Andreas Gieriet
 
"Things would be different if I ran the zoo." -- Dr. Seuss
 
"Wrong is evil, and it must be defeated." – Jeff Ello
 
"A good designer must rely on experience, on precise, logical thinking, and on pedantic exactness." -- Nigel Shaw
 
“It’s always easier to do it the hard way.” -- Blackhart

“If Unix wasn’t so bad that you can’t give it away, Bill Gates would never have succeeded in selling Windows.” -- Blackhart

"Omit needless local variables." -- Strunk... had he taught programming
 

 
"We learn more from our mistakes than we do from getting it right the first time."
 
My first rule of debugging: "If you get a different error message, you're making progress."
 
My golden rule of database management: "Do not unto others' databases as you would not have done unto yours."
 
My general rule of software development: "Design should be top-down, but implementation should be bottom-up."

Comments and Discussions

 
QuestionExcellent PinmemberXmen W.K.1-Jan-14 7:21 
GeneralRe: Excellent PinprofessionalPIEBALDconsult1-Jan-14 7:55 
QuestionWhy returning list as ReadOnly PinmemberXmen W.K.26-Feb-12 16:55 
AnswerRe: Why returning list as ReadOnly PinmemberPIEBALDconsult27-Feb-12 3:06 
GeneralMy vote of 5 PinmemberBenjano6-Jan-11 8:14 
GeneralMy vote of 5 PinmemberRoger Wright15-Dec-10 19:47 
GeneralMy vote of 5 PinmemberThomas Krojer15-Dec-10 4:13 
GeneralIEnumerable PinmemberJonathan C Dickinson29-Mar-10 22:18 
GeneralRe: IEnumerable PinmvpPIEBALDconsult30-Mar-10 5:22 
QuestionYour own CSV interpreter? PinmemberPaul B.29-Mar-10 15:58 
AnswerRe: Your own CSV interpreter? PinmvpPIEBALDconsult29-Mar-10 16:07 
Generaltext bug PinmvpLuc Pattyn28-Mar-10 18:52 
GeneralRe: text bug PinmvpPIEBALDconsult28-Mar-10 20:10 
GeneralRe: text bug PinmvpLuc Pattyn28-Mar-10 20:29 
Generalaha PinmvpLuc Pattyn28-Mar-10 8:54 
GeneralProduces well riven strings Pinmembersam.hill28-Mar-10 6:52 
GeneralRe: Produces well riven strings PinmvpPIEBALDconsult28-Mar-10 20:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141220.1 | Last Updated 28 Mar 2010
Article Copyright 2010 by PIEBALDconsult
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid