Click here to Skip to main content
15,879,535 members
Articles / Operating Systems / Windows

Split Function that Supports Text Qualifiers

Rate me:
Please Sign up or sign in to vote.
3.40/5 (15 votes)
18 Sep 2006CPOL5 min read 176.5K   27   31
Create a Split function that supports text qualifiers for use in C#.NET and VB.NET programs.

Introduction

I have always appreciated the String.Split function (and the Split function provided with VB.NET). The Split function divides a text value into separate parts based on a specified character, or delimiter. The Split function returns the parsed text value as a string array.

Unfortunately the Split function does not support text qualifiers. A text qualifier is a character used to mark the bounds of a block of text. Usually a quotation mark or apostrophe is used for the text qualifier although any character would work.

Since the Split function doesn't support text qualifiers when a delimiter is found within a text block, the block is split. It would be nice if we could pass in a text qualifier so that the text block would be treated as a single element.

In this article, we will create a Split function for both VB.NET and C#.NET that will support text qualifiers.

In the next article, Parsing Command Line Arguments, we will add support for assignment operators.

Approach

To solve this problem, we will need to look at each character in the text expression passed into the routine. We will need to identify when we are in a text block so that the delimiters will be ignored. When outside a text block, the delimiter will identify a new element in our array.

We begin by creating our routine and loop.

C#

C#
public string[] Split(string expression, string delimiter, 
			string qualifier, bool ignoreCase)
{
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
    }
}

VB.NET

VB.NET
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    For _CharIndex As Integer = 0 To expression.Length - 1
    Next
End Function

Managing Text Qualifiers

Next we will add a boolean variable to track when we are in a text block. When a text qualifier is found, we will set the boolean value to true. When another text qualifier is found, we will set the boolean value back to false. This can be done very easily by setting the boolean value equal to the opposite of its current state. We just have to remember to initialize the boolean value to false indicating that we aren't in the block.

It is important to use the length of the text qualifier to define how many characters to compare. This way, if we want to use more than one character to define the bounds of the text block, we can. This can be desirable when there is potential for a single character to show up in the text block. For example, we may want to use a quotation mark and an apostrophe as a text qualifier. This way, if either symbol is contained in the block it won't terminate the block.

There is another way to manage text blocks that contain text qualifiers. You already use this alternative approach when assigning values to a string variable. Simply duplicate the text qualifier inside the text block. Fortunately this approach is already supported by the way we are tracking text qualifiers.

Take the following example that uses the quotation mark for the text qualifier:

VB.NET
"Example With ""Text Qualifier"" Inside Text Block"

The first quotation mark turns on the boolean bit. The second one turns it off, however the next character is a text qualifier that turns it back on making it appear that we never closed the text block. Simple and effective!

C#

C#
public string[] Split(string expression, string delimiter, 
			string qualifier, bool ignoreCase)
{
    bool _QualifierState = false;
    
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
        if ((qualifier!=null) 
         & (string.Compare(expression.Substring(_CharIndex, qualifier.Length), 
			qualifier, ignoreCase)==0))
        {
            _QualifierState = !(_QualifierState);
        }
    }
}

VB.NET

VB.NET
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _QualifierState As Boolean = False
    
    For _CharIndex As Integer = 0 To expression.Length - 1
        If Not Qualifier Is Nothing _
        AndAlso String.Compare(experession.Substring_
	(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
            _QualifierState = Not _QualifierState
        End If
    Next
End Function

Another benefit to the approach taken above is that if we don't want to use a text qualifier, we don't have to. If the text qualifier is Nothing (VB.NET) or null (C#.NET), then the text qualifier logic is disabled!

Splitting the Text Expression

Now we are ready to search for the delimiter. We will use the length of the delimiter to define how many characters to use just in case we need to support more than one character for our delimiter. We will create a start index variable to use to track the first character of the text block. The current character index identifies the end of the text block.

Additionally we will store the values in a System.Collection.ArrayList object. Then when we are finished loading the ArrayList we will convert the list to a String array and return the values.

C#

C#
public string[] Split(string expression, string delimiter, 
			string qualifier, bool ignoreCase)
{
    bool _QualifierState = false;
    int _StartIndex = 0;
    System.Collections.ArrayList _Values = new System.Collections.ArrayList();
    
    for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
    {
        if ((qualifier!=null) 
         & (string.Compare(expression.Substring
		(_CharIndex, qualifier.Length), qualifier, ignoreCase)==0))
        {
            _QualifierState = !(_QualifierState);
        }
        else if (!(_QualifierState) & (delimiter!=null) 
              & (string.Compare(expression.Substring
		(_CharIndex, delimiter.Length), delimiter, ignoreCase)==0))
        {
            _Values.Add(expression.Substring
		(_StartIndex, _CharIndex - _StartIndex));
            _StartIndex = _CharIndex + 1;
        }
    }

    if (_StartIndex<expression.Length)
        _Values.Add(expression.Substring
		(_StartIndex, expression.Length - _StartIndex));
    
    string[] _returnValues = new string[_Values.Count];
    _Values.CopyTo(_returnValues);
    return _returnValues;
}

VB.NET

VB.NET
Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _QualifierState As Boolean = False
    Dim _StartIndex As Integer = 0
    Dim _Values As New System.Collections.ArrayList
    
    For _CharIndex As Integer = 0 To expression.Length - 1
        If Not Qualifier Is Nothing _
        AndAlso String.Compare(expression.Substring_
	(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
            _QualifierState = Not _QualifierState
        ElseIf Not _QualifierState _
        AndAlso Not delimiter Is Nothing _
        AndAlso String.Compare(expression.Substring_
	(_CharIndex, delimiter.Length), delimiter, ignoreCase)=0 Then
            _Values.Add(expression.Substring_
		(_StartIndex, _CharIndex - _StartIndex))
            _StartIndex = _CharIndex + 1
        End If
    Next
    
    If _StartIndex<expression.Length Then
        _Values.Add(expression.Substring(_StartIndex, _
			expression.Length - _StartIndex))
    
    Dim _returnValues(_Values.Count - 1) As String
    _Values.CopyTo(_returnValues)
    Return _returnValues
End Function

Using the Code

That's it! We are done. Now we can play with our new toy!

The code sample below parses a text value using a period for the delimiter and a quotation mark for the text qualifier. We have a period contained in the text qualifier and a quotation mark in the text block to demonstrate that our logic works. The Split function returns a string array with two elements in it:

  • This is an "example."
  • Cool!

C#

C#
using System.Windows.Forms;

public void Example()
{
    foreach (string _Part in Split_
	("This is an ""example."".Cool!", ".", "\"", true))
        MessageBox.Show(this, _Part, "Split Example", MessageBoxButtons.OK);
}

VB.NET

VB.NET
Public Sub Example()
    For Each _Part As String In Split_
	("This is an ""example."".Cool!", ".", "\"", True))
        MsgBox(_Part, MsgBoxStyle.OK, "Split Example")
    Next
End Sub

Regular Expression Alternative

As an alternative, you can parse the text as documented above using a Regular Expression. Regular Expressions are designed specifically for text parsing. While Regular Expressions are more elegant, they increase the cost of maintaining the application because few people understand Regular Expressions and fewer yet can create the expressions (even when using tools).

In light of the benefits and costs associated with Regular Expressions, it is worth taking time to demonstrate how a Regular Expression can solve our text parsing problem.

Abishek Bellamkonda was kind enough to provide a Regular Expression that could parse the text as documented above. Since I am no expert with Regular Expressions, I won't dive into how the Regular Expression works.

Please don't bombard me with questions on this topic as I only understand abstract Regular Expression concepts. I am providing this as an example for those who are interested. I am not providing this as an explanation of how to make Regular Expressions.

C#

C#
using System.Text.RegularExpressions;

public string[] Split(string expression, string delimiter, 
			string qualifier, bool ignoreCase)
{
    string _Statement = String.Format
		("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", 
                        Regex.Escape(delimiter), Regex.Escape(qualifier));

    RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
    if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;

    Regex _Expression = New Regex(_Statement, _Options);
    return _Expression.Split(expression);
}

VB.NET

VB.NET
Imports System.Text.RegularExpressions

Public Function Split( _
    ByVal expression As String, _
    ByVal delimiter As String, _
    ByVal qualifier As String, _
    ByVal ignoreCase As Boolean) _
As String()
    Dim _Statement As String = String.Format_
	("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", _
                      Regex.Escape(delimiter), Regex.Escape(qualifier))

    Dim _Options As RegexOptions = RegexOptions.Compiled Or RegexOptions.Multiline
    If ignoreCase Then _Options = _Options Or RegexOptions.IgnoreCase

    Dim _Expression As Regex = New Regex(_Statement, _Options)
    Return _Expression.Split(expression)
End Function

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
Larry Steinle is a systems analyst for HDR, Inc, a nationally recognized architecture, engineering, and consulting firm. He graduated with a certificate in Biblical Studies, an Associate in Computer Programming, and a Bachelor Degree in Management Information Systems.

Comments and Discussions

 
GeneralRe: Bugs in C# version Pin
LSteinle18-Sep-06 16:15
LSteinle18-Sep-06 16:15 
GeneralWhy not use a Collection? [modified] Pin
alnicol6-Sep-06 13:12
alnicol6-Sep-06 13:12 
GeneralRe: Why not use a Collection? Pin
LSteinle18-Sep-06 16:45
LSteinle18-Sep-06 16:45 
GeneralGood job but some bugs Pin
Karel Kral28-Aug-06 22:44
Karel Kral28-Aug-06 22:44 
GeneralRe: Good job but some bugs Pin
LSteinle29-Aug-06 2:00
LSteinle29-Aug-06 2:00 
QuestionWhy not use Regular Expressions? Pin
Abi Bellamkonda28-Aug-06 21:01
Abi Bellamkonda28-Aug-06 21:01 
AnswerRe: Why not use Regular Expressions? Pin
LSteinle29-Aug-06 1:58
LSteinle29-Aug-06 1:58 
GeneralGood job Pin
Mike_V28-Aug-06 19:02
Mike_V28-Aug-06 19:02 
Simple yet effective. Well-explained. Good job!

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.