Click here to Skip to main content
15,885,878 members
Articles / Programming Languages / C#

Rive

Rate me:
Please Sign up or sign in to vote.
4.97/5 (15 votes)
28 Mar 2010CPOL2 min read 39.5K   121   28   17
An improved string split method.

Introduction

This is an Extension Method which will split a string into substrings much like String.Split only better -- when requested, it will not split on delimiters within quotes or ones that have been "escaped".

Background

Often when I have to split a string -- from a CSV file or a command line perhaps -- I can't use Split because the values may contain the delimiter character. Many years ago, I wrote a string splitter function in C and later ported it to C#, but I haven't been very happy with it. This week I decided to begin afresh and write a new version. This version doesn't have all the features of the old one, but it is easier to read and has more flexibility than Split does.

Option enumeration

As with Split, an enumeration controls which features to use during the split operation; however, Rive supports more options -- specifically the ability to ignore delimiters within quotes. I also threw in the ability to escape characters so they won't be treated as delimiters or quotes.

C#
/**
<summary>
    Options for use with Rive.
</summary>
*/
[System.FlagsAttribute()]
public enum Option
{
    /**
    <summary>
        No options.
    </summary>
    */
    None = 0
,
    /**
    <summary>
        Do not include empty substrings.
    </summary>
    */
    RemoveEmptyEntries = 1
,
    /**
    <summary>
        Treat a special character following a backslash (\) as a regular character.
    </summary>
    */
    HonorEscapes = 2
,
    /**
    <summary>
        Do not split on delimiters within quotes (").
    </summary>
    */
    HonorQuotes = 4
,
    /**
    <summary>
        Do not split on delimiters within apostrophes (').
    </summary>
    */
    HonorApostrophes = 8
}

Rive

The public Rive methods (there are overloads, so the calling code needn't specify every parameter) are just front-ends to the DoRive method.

C#
public static System.Collections.Generic.IList<string>
Rive
(
    this string   Subject
,
    int           Count
,
    Option        Options
,
    params char[] Delimiters
)
{
    if ( Subject == null )
    {
        throw ( new System.ArgumentNullException
            ( "Subject" , "Subject must not be null" ) ) ;
    }

    if ( Count < 0 )
    {
        throw ( new System.ArgumentOutOfRangeException
            ( "Count" , "Count must not be negative" ) ) ;
    }

    return ( DoRive ( Subject , Count , Options , Delimiters ) ) ;
}

DoRive

DoRive behaves much like Split except that it returns an IList<string> rather than a string[], and has additional features.

  • The default delimiters are as documented for String.Split.
  • If Count is zero (0), then an empty collection is returned.
  • If Count is one (1), then the original string is returned unchanged.
  • Otherwise, iterate the string, checking for delimiters and other characters as requested.
  • If Count-1 substrings have been produced, then the rest of the string becomes the final substring.

The additional features are straight-forward:

  • If HonorEscapes is specified and a backslash (\) is encountered, then the following character is copied intact.
  • If HonorQuotes is specified and a quote (") is encountered, then the characters up to the next quote are copied intact.
  • If HonorApostrophes is specified and an apostrophe (') is encountered, then the characters up to the next apostrophe are copied intact.
  • Backslashes, Quotes, and Apostrophes may be escaped.
C#
private static System.Collections.Generic.IList<string>
DoRive
(
    string Subject
,
    int    Count
,
    Option Options
,
    char[] Delimiters
)
{
    System.Collections.Generic.List<string> result =
        new System.Collections.Generic.List<string>() ;

    if ( Count > 1 )
    {
        System.Text.StringBuilder temp =
            new System.Text.StringBuilder() ;

        System.Collections.Generic.HashSet<char> delims =
            new System.Collections.Generic.HashSet<char>() ;

        if ( Delimiters != null )
        {
            delims.UnionWith ( Delimiters ) ;
        }

        if ( delims.Count == 0 )
        {
            delims.UnionWith ( defaultdelimiters ) ;
        }

        bool remove = ( Options & Option.RemoveEmptyEntries ) == Option.RemoveEmptyEntries ;
        bool escape = ( Options & Option.HonorEscapes       ) == Option.HonorEscapes       ;
        bool quote  = ( Options & Option.HonorQuotes        ) == Option.HonorQuotes        ;
        bool apos   = ( Options & Option.HonorApostrophes   ) == Option.HonorApostrophes   ;

        char ch  ;
        int  pos = 0 ;
        int  len = Subject.Length ;

        while ( pos < len )
        {
            ch = Subject [ pos++ ] ;

            if ( delims.Contains ( ch ) )
            {
                if ( ( temp.Length > 0 ) || !remove )
                {
                    result.Add ( temp.ToString() ) ;

                    temp.Length = 0 ;

                    if
                    (
                        ( result.Count == Count - 1 )
                    &&
                        ( pos < len )
                    )
                    {
                        temp.Append ( Subject.Substring ( pos ) ) ;

                        pos = len ;
                    }
                }
            }
            else
            {
                if ( escape && ( ch == '\\' ) && ( pos < len ) )
                {
                    temp.Append ( ch ) ;

                    ch = Subject [ pos++ ] ;
                }
                else if ( quote && ( ch == '\"' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\"' ) ) ;
                }
                else if ( apos && ( ch == '\'' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\'' ) ) ;
                }

                temp.Append ( ch ) ;
            }
        }

        if ( ( temp.Length > 0 ) || !remove )
        {
            result.Add ( temp.ToString() ) ;
        }
    }
    else if ( Count == 1 )
    {
        result.Add ( Subject ) ;
    }

    return ( result.AsReadOnly() ) ;
}

History

  • 2010-03-26: First submitted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
BSCS 1992 Wentworth Institute of Technology

Originally from the Boston (MA) area. Lived in SoCal for a while. Now in the Phoenix (AZ) area.

OpenVMS enthusiast, ISO 8601 evangelist, photographer, opinionated SOB, acknowledged pedant and contrarian

---------------

"I would be looking for better tekkies, too. Yours are broken." -- Paul Pedant

"Using fewer technologies is better than using more." -- Rico Mariani

"Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’" -- Steve McConnell

"Every time you write a comment, you should grimace and feel the failure of your ability of expression." -- Unknown

"If you need help knowing what to think, let me know and I'll tell you." -- Jeffrey Snover [MSFT]

"Typing is no substitute for thinking." -- R.W. Hamming

"I find it appalling that you can become a programmer with less training than it takes to become a plumber." -- Bjarne Stroustrup

ZagNut’s Law: Arrogance is inversely proportional to ability.

"Well blow me sideways with a plastic marionette. I've just learned something new - and if I could award you a 100 for that post I would. Way to go you keyboard lovegod you." -- Pete O'Hanlon

"linq'ish" sounds like "inept" in German -- Andreas Gieriet

"Things would be different if I ran the zoo." -- Dr. Seuss

"Wrong is evil, and it must be defeated." –- Jeff Ello

"A good designer must rely on experience, on precise, logical thinking, and on pedantic exactness." -- Nigel Shaw

“It’s always easier to do it the hard way.” -- Blackhart

“If Unix wasn’t so bad that you can’t give it away, Bill Gates would never have succeeded in selling Windows.” -- Blackhart

"Use vertical and horizontal whitespace generously. Generally, all binary operators except '.' and '->' should be separated from their operands by blanks."

"Omit needless local variables." -- Strunk... had he taught programming

Comments and Discussions

 
QuestionExcellent Pin
Xmen Real 1-Jan-14 6:21
professional Xmen Real 1-Jan-14 6:21 
GeneralRe: Excellent Pin
PIEBALDconsult1-Jan-14 6:55
mvePIEBALDconsult1-Jan-14 6:55 
QuestionWhy returning list as ReadOnly Pin
Xmen Real 26-Feb-12 15:55
professional Xmen Real 26-Feb-12 15:55 
AnswerRe: Why returning list as ReadOnly Pin
PIEBALDconsult27-Feb-12 2:06
mvePIEBALDconsult27-Feb-12 2:06 
GeneralMy vote of 5 Pin
Benjano6-Jan-11 7:14
professionalBenjano6-Jan-11 7:14 
GeneralMy vote of 5 Pin
Roger Wright15-Dec-10 18:47
professionalRoger Wright15-Dec-10 18:47 
GeneralMy vote of 5 Pin
Thomas Krojer15-Dec-10 3:13
Thomas Krojer15-Dec-10 3:13 
GeneralIEnumerable Pin
Jonathan C Dickinson29-Mar-10 21:18
Jonathan C Dickinson29-Mar-10 21:18 
GeneralRe: IEnumerable Pin
PIEBALDconsult30-Mar-10 4:22
mvePIEBALDconsult30-Mar-10 4:22 
QuestionYour own CSV interpreter? Pin
Paul B.29-Mar-10 14:58
Paul B.29-Mar-10 14:58 
AnswerRe: Your own CSV interpreter? Pin
PIEBALDconsult29-Mar-10 15:07
mvePIEBALDconsult29-Mar-10 15:07 
Generaltext bug Pin
Luc Pattyn28-Mar-10 17:52
sitebuilderLuc Pattyn28-Mar-10 17:52 
GeneralRe: text bug Pin
PIEBALDconsult28-Mar-10 19:10
mvePIEBALDconsult28-Mar-10 19:10 
GeneralRe: text bug Pin
Luc Pattyn28-Mar-10 19:29
sitebuilderLuc Pattyn28-Mar-10 19:29 
Generalaha Pin
Luc Pattyn28-Mar-10 7:54
sitebuilderLuc Pattyn28-Mar-10 7:54 
GeneralProduces well riven strings Pin
sam.hill28-Mar-10 5:52
sam.hill28-Mar-10 5:52 
Thanks Sir John
GeneralRe: Produces well riven strings Pin
PIEBALDconsult28-Mar-10 19:09
mvePIEBALDconsult28-Mar-10 19:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.