Introduction
I have always appreciated the String.Split
function (and the Split
function provided with VB.NET). The Split
function divides a text value into separate parts based on a specified character, or delimiter. The Split
function returns the parsed text value as a string
array.
Unfortunately the Split
function does not support text qualifiers. A text qualifier is a character used to mark the bounds of a block of text. Usually a quotation mark or apostrophe is used for the text qualifier although any character would work.
Since the Split
function doesn't support text qualifiers when a delimiter is found within a text block, the block is split. It would be nice if we could pass in a text qualifier so that the text block would be treated as a single element.
In this article, we will create a Split
function for both VB.NET and C#.NET that will support text qualifiers.
In the next article, Parsing Command Line Arguments, we will add support for assignment operators.
Approach
To solve this problem, we will need to look at each character in the text expression passed into the routine. We will need to identify when we are in a text block so that the delimiters will be ignored. When outside a text block, the delimiter will identify a new element in our array.
We begin by creating our routine and loop.
C#
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
{
}
}
VB.NET
Public Function Split( _
ByVal expression As String, _
ByVal delimiter As String, _
ByVal qualifier As String, _
ByVal ignoreCase As Boolean) _
As String()
For _CharIndex As Integer = 0 To expression.Length - 1
Next
End Function
Managing Text Qualifiers
Next we will add a boolean variable to track when we are in a text block. When a text qualifier is found, we will set the boolean value to true
. When another text qualifier is found, we will set the boolean value back to false
. This can be done very easily by setting the boolean value equal to the opposite of its current state. We just have to remember to initialize the boolean value to false
indicating that we aren't in the block.
It is important to use the length of the text qualifier to define how many characters to compare. This way, if we want to use more than one character to define the bounds of the text block, we can. This can be desirable when there is potential for a single character to show up in the text block. For example, we may want to use a quotation mark and an apostrophe as a text qualifier. This way, if either symbol is contained in the block it won't terminate the block.
There is another way to manage text blocks that contain text qualifiers. You already use this alternative approach when assigning values to a string
variable. Simply duplicate the text qualifier inside the text block. Fortunately this approach is already supported by the way we are tracking text qualifiers.
Take the following example that uses the quotation mark for the text qualifier:
"Example With ""Text Qualifier"" Inside Text Block"
The first quotation mark turns on the boolean bit. The second one turns it off, however the next character is a text qualifier that turns it back on making it appear that we never closed the text block. Simple and effective!
C#
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
bool _QualifierState = false;
for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
{
if ((qualifier!=null)
& (string.Compare(expression.Substring(_CharIndex, qualifier.Length),
qualifier, ignoreCase)==0))
{
_QualifierState = !(_QualifierState);
}
}
}
VB.NET
Public Function Split( _
ByVal expression As String, _
ByVal delimiter As String, _
ByVal qualifier As String, _
ByVal ignoreCase As Boolean) _
As String()
Dim _QualifierState As Boolean = False
For _CharIndex As Integer = 0 To expression.Length - 1
If Not Qualifier Is Nothing _
AndAlso String.Compare(experession.Substring_
(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
_QualifierState = Not _QualifierState
End If
Next
End Function
Another benefit to the approach taken above is that if we don't want to use a text qualifier, we don't have to. If the text qualifier is Nothing
(VB.NET) or null
(C#.NET), then the text qualifier logic is disabled!
Splitting the Text Expression
Now we are ready to search for the delimiter. We will use the length of the delimiter to define how many characters to use just in case we need to support more than one character for our delimiter. We will create a start index variable to use to track the first character of the text block. The current character index identifies the end of the text block.
Additionally we will store the values in a System.Collection.ArrayList
object. Then when we are finished loading the ArrayList
we will convert the list to a String
array and return the values.
C#
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
bool _QualifierState = false;
int _StartIndex = 0;
System.Collections.ArrayList _Values = new System.Collections.ArrayList();
for (int _CharIndex=0; _CharIndex<expression.Length-1; _CharIndex++)
{
if ((qualifier!=null)
& (string.Compare(expression.Substring
(_CharIndex, qualifier.Length), qualifier, ignoreCase)==0))
{
_QualifierState = !(_QualifierState);
}
else if (!(_QualifierState) & (delimiter!=null)
& (string.Compare(expression.Substring
(_CharIndex, delimiter.Length), delimiter, ignoreCase)==0))
{
_Values.Add(expression.Substring
(_StartIndex, _CharIndex - _StartIndex));
_StartIndex = _CharIndex + 1;
}
}
if (_StartIndex<expression.Length)
_Values.Add(expression.Substring
(_StartIndex, expression.Length - _StartIndex));
string[] _returnValues = new string[_Values.Count];
_Values.CopyTo(_returnValues);
return _returnValues;
}
VB.NET
Public Function Split( _
ByVal expression As String, _
ByVal delimiter As String, _
ByVal qualifier As String, _
ByVal ignoreCase As Boolean) _
As String()
Dim _QualifierState As Boolean = False
Dim _StartIndex As Integer = 0
Dim _Values As New System.Collections.ArrayList
For _CharIndex As Integer = 0 To expression.Length - 1
If Not Qualifier Is Nothing _
AndAlso String.Compare(expression.Substring_
(_CharIndex, qualifier.Length), qualifier, ignoreCase)=0 Then
_QualifierState = Not _QualifierState
ElseIf Not _QualifierState _
AndAlso Not delimiter Is Nothing _
AndAlso String.Compare(expression.Substring_
(_CharIndex, delimiter.Length), delimiter, ignoreCase)=0 Then
_Values.Add(expression.Substring_
(_StartIndex, _CharIndex - _StartIndex))
_StartIndex = _CharIndex + 1
End If
Next
If _StartIndex<expression.Length Then
_Values.Add(expression.Substring(_StartIndex, _
expression.Length - _StartIndex))
Dim _returnValues(_Values.Count - 1) As String
_Values.CopyTo(_returnValues)
Return _returnValues
End Function
Using the Code
That's it! We are done. Now we can play with our new toy!
The code sample below parses a text value using a period for the delimiter and a quotation mark for the text qualifier. We have a period contained in the text qualifier and a quotation mark in the text block to demonstrate that our logic works. The Split
function returns a string
array with two elements in it:
- This is an "example."
- Cool!
C#
using System.Windows.Forms;
public void Example()
{
foreach (string _Part in Split_
("This is an ""example."".Cool!", ".", "\"", true))
MessageBox.Show(this, _Part, "Split Example", MessageBoxButtons.OK);
}
VB.NET
Public Sub Example()
For Each _Part As String In Split_
("This is an ""example."".Cool!", ".", "\"", True))
MsgBox(_Part, MsgBoxStyle.OK, "Split Example")
Next
End Sub
Regular Expression Alternative
As an alternative, you can parse the text as documented above using a Regular Expression. Regular Expressions are designed specifically for text parsing. While Regular Expressions are more elegant, they increase the cost of maintaining the application because few people understand Regular Expressions and fewer yet can create the expressions (even when using tools).
In light of the benefits and costs associated with Regular Expressions, it is worth taking time to demonstrate how a Regular Expression can solve our text parsing problem.
Abishek Bellamkonda was kind enough to provide a Regular Expression that could parse the text as documented above. Since I am no expert with Regular Expressions, I won't dive into how the Regular Expression works.
Please don't bombard me with questions on this topic as I only understand abstract Regular Expression concepts. I am providing this as an example for those who are interested. I am not providing this as an explanation of how to make Regular Expressions.
C#
using System.Text.RegularExpressions;
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
string _Statement = String.Format
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
Regex.Escape(delimiter), Regex.Escape(qualifier));
RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;
Regex _Expression = New Regex(_Statement, _Options);
return _Expression.Split(expression);
}
VB.NET
Imports System.Text.RegularExpressions
Public Function Split( _
ByVal expression As String, _
ByVal delimiter As String, _
ByVal qualifier As String, _
ByVal ignoreCase As Boolean) _
As String()
Dim _Statement As String = String.Format_
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", _
Regex.Escape(delimiter), Regex.Escape(qualifier))
Dim _Options As RegexOptions = RegexOptions.Compiled Or RegexOptions.Multiline
If ignoreCase Then _Options = _Options Or RegexOptions.IgnoreCase
Dim _Expression As Regex = New Regex(_Statement, _Options)
Return _Expression.Split(expression)
End Function