Click here to Skip to main content
6,594,432 members and growing! (15,034 online)
Email Password   helpLost your password?
Platforms, Frameworks & Libraries » .NET Framework » General     Intermediate

Split Function that Supports Text Qualifiers

By LSteinle

Create a Split function that supports text qualifiers for use in C#.Net and VB.Net programs.
Windows, .NET, Visual Studio, Dev
Posted:27 Aug 2006
Updated:18 Sep 2006
Views:51,072
Bookmarked:21 times
Unedited contribution
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
12 votes for this article.
Popularity: 3.52 Rating: 3.26 out of 5
1 vote, 8.3%
1
2 votes, 16.7%
2
1 vote, 8.3%
3
5 votes, 41.7%
4
3 votes, 25.0%
5

Introduction

I have always appreciated the String.Split function (and the Split function provided with VB.Net). The Split function divides a text value into seperate parts based on a specified character, or delimiter. The Split function returns the parsed text value as a string array.

Unfortunately the Split function does not support text qualifiers. A text qualifier is a character used to mark the bounds of a block of text. Usually a quotation mark or apostraphe is used for the text qualifier although any character would work.

Since the Split function doesn't support text qualifiers when a delimiter is found within a text block the block is split. It would be nice if we could pass in a text qualifier so that the text block would be treated as a single element.

In this article we will create a Split function for both VB.Net and C#.Net that will support text qualifiers.

In the next article, Parsing Command Line Arguments, we will add support for assignment operators.

Approach

To solve this problem we will need to look at each character in the text expression passed into the routine. We will need to identify when we are in a text block so that the delimiters will be ignored. When outside a text block the delimiter will identify a new element in our array.

We begin by creating our routine and loop.

C#
public string[] Split(string expression, string delimiter, string qualifier, bool ignoreCase)
{
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
    }
}
VB.Net
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    For _CharIndex As Integer = 0 To expression.Length - 1
    Next
End Function

Managing Text Qualifiers

Next we will add a boolean variable to track when we are in a text block. When a text qualifier is found we will set the boolean value to true. When another text qualifier is found we will set the boolean value back to false. This can be done very easily by setting the boolean value equal to the opposite of it's current state. We just have to remember to initialize the boolean value to false indicating that we aren't in the block.

It is important to use the length of the text qualifier to define how many characters to compare. This way if we want to use more than one character to define the bounds of the text block we can. This can be desirable when there is potential for a single character to show up in the text block. For example, we may want to use a quotation mark and an apostraphe as a text qualifier. This way if either symbol is contained in the block it won't terminate the block.

There is another way to manage text blocks that contain text qualifiers. You already use this alternative approach when assigning values to a string variable. Simply duplicate the text qualifier inside the text block. Fortunately this approach is already supported by the way we are tracking text qualifiers.

Take the following example that uses the quotation mark for the text qualifier:

"Example With ""Text Qualifier"" Inside Text Block"

The first quotation mark turns on the boolean bit. The second one turns it off, however the next character is a text qualifier that turns it back on making it appear that we never closed the text block. Simple and effective!

C#
public string[] Split(string expression, string delimiter, string qualifier, bool ignoreCase)
{
    bool _QualifierState = false;
    
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
        if ((qualifier!=null) 
         & (string.Compare(expression.Substring(_CharIndex, qualifier.Length), qualifier, ignoreCase)==0))
        {
            _QualifierState = !(_QualifierState);
        }
    }
}
VB.Net
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _QualifierState As Boolean = False
    
    For _CharIndex As Integer = 0 To expression.Length - 1
        If Not Qualifier Is Nothing _
        AndAlso String.Compare(experession.Substring(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
            _QualifierState = Not _QualifierState
        End If
    Next
End Function

Another benefit to the approach taken above is that if we don't want to use a text qualifier we don't have to. If the text qualifier is Nothing (VB.Net) or null (C#.Net) then the text qualifier logic is disabled!

Splitting the Text Expression

Now we are ready to search for the delimiter. We will use the length of the delimiter to define how many characters to use just in case we need to support more than one character for our delimiter. We will create a start index variable to use to track the first character of the text block. The current character index identifies the end of the text block.

Additionally we will store the values in a System.Collection.ArrayList object. Then when we are finished loading the ArrayList we will convert the list to a String array and return the values.

C#
public string[] Split(string expression, string delimiter, string qualifier, bool ignoreCase)
{
    bool _QualifierState = false;
    int _StartIndex = 0;
    System.Collections.ArrayList _Values = new System.Collections.ArrayList();
    
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
        if ((qualifier!=null) 
         & (string.Compare(expression.Substring(_CharIndex, qualifier.Length), qualifier, ignoreCase)==0))
        {
            _QualifierState = !(_QualifierState);
        }
        else if (!(_QualifierState) & (delimiter!=null) 
              & (string.Compare(expression.Substring(_CharIndex, delimiter.Length), delimiter, ignoreCase)==0))
        {
            _Values.Add(expression.Substring(_StartIndex, _CharIndex - _StartIndex));
            _StartIndex = _CharIndex + 1;
        }
    }

    if (_StartIndex<expression.Length)
        _Values.Add(expression.Substring(_StartIndex, expression.Length - _StartIndex));
    
    string[] _returnValues = new string[_Values.Count];
    _Values.CopyTo(_returnValues);
    return _returnValues;
}
VB.Net
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _QualifierState As Boolean = False
    Dim _StartIndex As Integer = 0
    Dim _Values As New System.Collections.ArrayList
    
    For _CharIndex As Integer = 0 To expression.Length - 1
        If Not Qualifier Is Nothing _
        AndAlso String.Compare(expression.Substring(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
            _QualifierState = Not _QualifierState
        ElseIf Not _QualifierState _
        AndAlso Not delimiter Is Nothing _
        AndAlso String.Compare(expression.Substring(_CharIndex, delimiter.Length), delimiter, ignoreCase)=0 Then
            _Values.Add(expression.Substring(_StartIndex, _CharIndex - _StartIndex))
            _StartIndex = _CharIndex + 1
        End If
    Next
    
    If _StartIndex<expression.Length Then
        _Values.Add(expression.Substring(_StartIndex, expression.Length - _StartIndex))
    
    Dim _returnValues(_Values.Count - 1) As String
    _Values.CopyTo(_returnValues)
    Return _returnValues
End Function

Using the code

That's it! We are done. Now we can play with our new toy!

The code sample below parses a text value using a period for the delimiter and a quotation mark for the text qualifier. We have a period contained in the text qualifier and a quotation mark in the text block to demonstrate that our logic works. The Split function returns a string array with two elements in it:

  • This is an "example."
  • Cool!

C#
using System.Windows.Forms;

public void Example()
{
    foreach (string _Part in Split("This is an ""example."".Cool!", ".", "\"", true))
        MessageBox.Show(this, _Part, "Split Example", MessageBoxButtons.OK);
}
VB.Net
Public Sub Example()
    For Each _Part As String In Split("This is an ""example."".Cool!", ".", "\"", True))
        MsgBox(_Part, MsgBoxStyle.OK, "Split Example")
    Next
End Sub

Regular Expression Alternative

As an alternative you can parse the text as documented above using a Regular Expression. Regular Expressions are designed specifically for text parsing. While Regular Expressions are more elegant they increase the cost of maintaining the application because few people understand Regular Expressions and fewer yet can create the expressions (even when using tools).

In light of the benefits and costs associated with Regular Expressions it is worth taking time to demonstrate how a Regular Expression can solve our text parsing problem.

Abishek Bellamkonda was kind enough to provide a Regular Expression that could parse the text as documented above. Since I am no expert with Regular Expressions I won't dive into how the Regular Expression works.

Please don't bombard me with questions on this topic as I only understand abstract Regular Expression concepts. I am providing this as an example for those who are interested. I am not providing this as an explanation of how to make Regular Expressions.

C#
using System.Text.RegularExpressions;

public string[] Split(string expression, string delimiter, string qualifier, bool ignoreCase)
{
    string _Statement = String.Format("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", 
                        Regex.Escape(delimiter), Regex.Escape(qualifier));

    RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
    if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;

    Regex _Expression = New Regex(_Statement, _Options);
    return _Expression.Split(expression);
}
VB.Net
Imports System.Text.RegularExpressions

Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _Statement As String = String.Format("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", _
                               Regex.Escape(delimiter), Regex.Escape(qualifier))

    Dim _Options As RegexOptions = RegexOptions.Compiled Or RegexOptions.Multiline
    If ignoreCase Then _Options = _Options Or RegexOptions.IgnoreCase

    Dim _Expression As Regex = New Regex(_Statement, _Options)
    Return _Expression.Split(expression)
End Function

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

LSteinle


Member
Larry Steinle is a systems analyst for HDR, Inc, a nationally recognized architecture, engineering, and consulting firm. He graduated with a certificate in Biblical Studies, an Associate in Computer Programming, and a Bachelor Degree in Management Information Systems.
Occupation: Web Developer
Location: United States United States

Other popular .NET Framework articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 24 of 24 (Total in Forum: 24) (Refresh)FirstPrevNext
GeneralSomething similar [modified] Pinmemberbyapparov9:13 5 Nov '09  
GeneralSimple Explaination of the Regex (simple being a relative term) PinmemberThymine12:27 13 Aug '09  
Generalmultiple text qualifiers PinmemberAttilio Pavone6:41 22 Apr '08  
Questionsplit th line Pinmembersowmya k2:03 27 Mar '08  
GeneralRe: split th line PinmemberLSteinle5:04 11 Apr '08  
AnswerRe: split th line PinmemberRamesh12620:58 27 Jun '08  
GeneralGood Job, couple more bugs though PinmemberDenverEd12:00 2 Nov '07  
GeneralResult includes Text qualifier character PinmemberBortiquai10:36 22 Sep '07  
GeneralRe: Result includes Text qualifier character PinmemberLSteinle12:06 22 Sep '07  
GeneralBug PinmemberVikcia22:02 11 Sep '07  
GeneralNice Regular Expression for csv files PinmemberDanny Crowell10:33 7 Sep '07  
Generalbad example regex for splitting string with quotes PinmemberCory Albrecht13:43 4 Dec '06  
GeneralRe: bad example regex for splitting string with quotes PinmemberLSteinle5:39 9 Dec '06  
GeneralRe: bad example regex for splitting string with quotes PinmemberCory Albrecht14:00 10 Dec '06  
GeneralRe: bad example regex for splitting string with quotes PinmemberLSteinle17:04 10 Dec '06  
GeneralBugs in C# version [modified] Pinmemberpierscanadas6:18 13 Sep '06  
GeneralRe: Bugs in C# version PinmemberLSteinle17:15 18 Sep '06  
GeneralWhy not use a Collection? [modified] Pinmemberalnicol14:12 6 Sep '06  
GeneralRe: Why not use a Collection? PinmemberLSteinle17:45 18 Sep '06  
GeneralGood job but some bugs PinmemberKarel Kral23:44 28 Aug '06  
GeneralRe: Good job but some bugs PinmemberLSteinle3:00 29 Aug '06  
QuestionWhy not use Regular Expressions? PinmemberAbishek Bellamkonda22:01 28 Aug '06  
AnswerRe: Why not use Regular Expressions? PinmemberLSteinle2:58 29 Aug '06  
GeneralGood job PinmemberMike_V20:02 28 Aug '06  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 18 Sep 2006
Editor:
Copyright 2006 by LSteinle
Everything else Copyright © CodeProject, 1999-2009
Web22 | Advertise on the Code Project