Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / C#
Article

Persistent String Parser

Rate me:
Please Sign up or sign in to vote.
4.33/5 (18 votes)
21 Dec 2008Public Domain4 min read 76.8K   380   38   35
Parse a string with quoted elements, insert/add/delete elements, and is CLS compliant
Screenshot - qstringparser.png

Introduction

Well, I'm finally dragging myself kicking and screaming into the world of C# and .Net. I've decided that in order to stay employable, I have to move away from C++. Toward that end, I recently accepted a C# programming position, and since I didn't want to start a new job cold, I figured I'd get some additional C# experience by porting one or more of my C++/MFC articles to the .Net platform.

Update 21 Dec 2008: A recent comment about this code caused me to revisit it and resulted in a substantial rewrite of the class in question. This article and the downloadable code have been updated to reflect these changes.

What the StringParser Class Is

Essentially, the StringParser class accepts a delimited string, breaks it down into its delimited elements according to the specified delimiter string (or character), and stores it in a string list. You can also add to, insert into, or delete fro the list of parsed elements. Further it supports quoted strings. While you might be inclined to say say that the string.Split() method does the same thing, I agreee - to a point. It does indeed allow you to split a string using a character or a string as a delimiter, but it doesn't support quoted elements, nor does it (readily) provide a method for changing the list of elements.

Personally, I don't see much use for a delimiter string, but someone requested such a feature in my original MFC/VC6 article, and the string.Split() method provides the functonality, so I figured I'd give it a shot in this version of the class. Another new feature (over and above what was provide din the C++ version of the article) is that when deleting a field, the programmer has the choice of deleting all instances of a given string, the first instance, or the last instance.

In the original version of this article, I put the class into a class library project because it seemed to be the right thing to do, but for the most recent version, I put it into the executing assembly just to keep things simple.

One item of note - this class is CLS compliant, so it can be put into its own asembly and used by any language supported by dotNet.

The StringParser Class

The class itself isn't really that fancy, and can essentially be considered a wrapper around the string.Split() method. The real action happens in the Parse() method:

protected int ParseString(string text)
{
    // split the string according to the specified delimiter
    string[] parts = text.Split(m_delimiter.ToCharArray(), StringSplitOptions.None);
    string combinedParts = "";
    bool quoted = false;

    // add each item to the list
    foreach (string part in parts)
    {
        // if the current part contains a quote character, we're building a 
	// multi-part list item, so turn on our tracker
	if (CharacterCount(m_quoteChar, part) == 1)
	{
	    quoted = (quoted) ? false : true;
	}

        // add the part to the combined word
	combinedParts += part;

        // if we're working on a quoted string, add a delimiter character to the 
        // end of the current combined parts.
        if (quoted)
        {    
            combinedParts += m_delimiter;
        }

        // if we're no longer building a combined part, add the current combined 
        // part to the list of elements, and reset the combinedPart variable.
        if (!quoted)
        {
            m_elementList.Add(combinedParts);
            combinedParts = "";
        }
    }

    return m_elementList.Count;
}

This version of the method is MUCH shorter than the original one because dave.dolan recently informed me of the fact that the string.Split() method supports delimiter strings. This in turn caused me to re-evaluate the code, and I came up with a much lower-impact version and fixed a quote string bug in the process. In a nutshell, the method splits the string according to the specified delimiter character, and the adds the individual parts to the internally maintained list. If the string is a quoted string (and if the StringParser object was told to handle quoted strings), the previously split parts are re-combined to for the quoted string.

Note: In the interest of maintaining an accurate historical record, I left the original download in this article, and added a new downoad with the much-improved code. The original code is an embarassment, but I think it's important for new programmers to see how important it is to become as familiar as possible with the dotNet framework as possible.

The Sample Application

The sample application took three times longer to write than the actual class (again, due entirely to my lack of exposure to the .Net framework. There's no real validation being done, so if you experience any quirks, let me know and I'll try to post updated code. Better yet, be a programmer, fix them yourself, and then posrt your findings here.

Class Reference

namespace StringParserLib
{
    public enum eDeleteAction { DeleteAll, DeleteFirst, DeleteLast };

    /// <summary>
    ///  Persistent string parser by Paddedwall Software. 
    /// </summary>
    public class StringParser
    {
        // Properties
        public List<string> Elements
        public string Delimiter
        public char QuoteCharacter


	// Constructors
        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you don't yet know the string to be parsed. 
        /// It is assumed that the delimiter will be a comma, and the string is not 
        /// quoted.
        /// </summary>
        public StringParser()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// you want to use the default delimiter of ",".
        /// </summary>
        /// <param name="value">The string to parse</param>
        public StringParser(string value)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// you only need to specify the delimiter for a non-quoted string.
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter string</param>
        public StringParser(string value, string delimiter)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// want to specify the delimiter AND the quote character.
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter to use</param>
        /// <param name="quote">The quote character to look for (only needed if the 
        /// string contains quote characters that you want to honor)</param>
        public StringParser(string value, string delimiter, char quote)


        // Protected Methods

       //--------------------------------------------------------------------------------
        /// <summary>
        /// Initializes the data members and allocates the string array
        /// </summary>
        protected void ProtectedInit()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Removes all of the items from the string array
        /// </summary>
        protected void Clear()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Returns true if the specified index is a valid index into the array.
        /// </summary>
        /// <param name="index">The index to check</param>
        /// <returns>True if the specified index is within the valid range</returns>
        protected bool IndexInRange(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Does the actual parsing work
        /// </summary>
        /// <param name="character">The charecter to look for</param>
        /// <returns>The number of fields parsed</returns>
        protected int ParseString(string text)


	// Public Methods

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Clears the list, resets the delimiter and quote character, and the 
        /// re-parses the string
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter to use</param>
        /// <param name="quote">The quote character to look for (only needed if the 
        /// string contains quote characters that you want to honor)</param>
        /// <returns>The number of fields parsed</returns>
        public int ResetOriginalString(string text, string delimiter, char quote)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Rebuilds the original string by cycling through the array list.
        /// </summary>
        /// <returns>The original string used to create the element list</returns>
        public string ReassembleOriginalString()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Retrieves and returns the string at the specified index
        /// </summary>
        /// <param name="index">The index of the desired field</param>
        /// <returns>The field at the specified index</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public string GetField(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Retrieves the string at the specified index, and strips the specified string
        /// before returning it to the calling function.  This method is useful for 
        /// removing quotes.
        /// </summary>
        /// <param name="index">The index of the desired field</param>
        /// <param name="delimiter">The text to remove from the returned value</param>
        /// <returns>The field at the specified index</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public string GetField(int index, string textToRemove)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string regardless of case.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int Find(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string regardless of case.
        /// </summary>
        /// <param name="startingIndex">The index at which to start the search</param>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int Find(int startingIndex, string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string - case sensitive.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindExact(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string - case sensitive.
        /// </summary>
        /// <param name="startingIndex">The index at which to start the search</param>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int FindExact(int startingIndex, string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the string regardless of case, starting at the last element.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindReverse(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the exact string, but starting at the last element - case-sensitive.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindReverseExact(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Adds a new string to the array and resets the origial string to include the
        /// new field. It's okay if the string is empty.
        /// </summary>
        /// <param name="textToAdd">The text we want to add</param>
        public void AddField(string textToAdd)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Insert the specified string at the specified inded in the array list. Since
        /// exceptions are pointless here, we'll normalize the specified index so that this
        /// function doesn't insert the string at element 0, or if oustide the current
        /// range, simply adds it to the end of the array.  The return value indicates
        /// where in the array, the item was inserted or added.  If the return value isn't
        /// the same as the specified index, the calling function passed in an invalid
        /// value.
        /// </summary>
        /// <param name="index">The index at which to insert the new string</param>
        /// <param name="textToInsert">The text to insert</param>
        /// <returns>The index at which the string was inserted</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int InsertField(int index, string textToInsert)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Deletes the specified array list element, and returns true if the deletion was 
        /// successful.
        /// </summary>
        /// <param name="index">The index to delete</param>
        /// <returns>True if the item was deleted</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public bool DeleteField(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Deletes the specified array list element, and returns true if the deletion 
        /// was successful. 
        /// </summary>
        /// <param name="textToDelete">The index string to delete</param>
        /// <param name="exactMatch">The index at which to insert the new string</param>
        /// <param name="deleteAction">How to delete the item (first occurance, last 
        /// occurance, or all occurrances)</param>
        /// <returns>True if the item was deleted</returns>
        public bool DeleteField(string textToDelete, bool exactMatch, DeleteAction deleteAction)

    } 
}
</string>

As you can see, the class is fairly straightforward, and using it is a simple matter for even the most novice of .Net programmers (and don't forget - it's CLS compliant).

Change History

21 Dec 2008 (update inspired by comment from CP user dave.dolan - Thanks Dave!)

  • Significant change to StringParse method. It's much shorter, and much more reliable.
  • Proper XML comments added.
  • The original string is no longer stored in the object, so all GetField calls are now zero-based (the old version was one-based).
  • Added new constructor overloads.
  • Added new Find and FindExact overloads that let you start the search at a specified index.
  • Added proper exception handling.

31 Jul 2007 - (based on comments submitted after initial article upload)

  • Changed class to use .Net naming convention
  • Renamed namespace and class to comply with first change describe above
  • Eliminated hungarian notation (I feel somehow dirty as a result)
  • Changed from using ArrayList to generic List
  • Changed function comments to Intellisense-compatible summary tags
  • Moved enum declaration outside the class
  • Changed demo app to use .Net naming convention, including control names.
  • Changed demo app to provide basic validation.

30 Jul 2007

  • Original article.

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication


Written By
Software Developer (Senior) Paddedwall Software
United States United States
I've been paid as a programmer since 1982 with experience in Pascal, and C++ (both self-taught), and began writing Windows programs in 1991 using Visual C++ and MFC. In the 2nd half of 2007, I started writing C# Windows Forms and ASP.Net applications, and have since done WPF, Silverlight, WCF, web services, and Windows services.

My weakest point is that my moments of clarity are too brief to hold a meaningful conversation that requires more than 30 seconds to complete. Thankfully, grunts of agreement are all that is required to conduct most discussions without committing to any particular belief system.

Comments and Discussions

 
GeneralHungarian Notation Discussion 1000th Pin
johannesnestler21-Dec-08 22:39
johannesnestler21-Dec-08 22:39 
Allways the same discussion about this topic... But I can't resist to share my opinion.

I'm C++ and C# developer and I use different prefixing for each language. What I use for C# is not really HN but it leads to the same benefit:

* Code is not only seen in an IDE - it's text, so sometime it is printed or seen in an text editor which doesn't "know" about "Code".
* The m_ or _ or m (or whatever you use) prefix is very common. When I look at the Code of my junior developers I think it helps them a lot. The newbies are often confused by variable scope.
* I do a lot of GUI programming (Sometimes several hundred Controls on a Form (Cause of the Framework we use)). So I use a prefixing Schema for the Controls too. Benefit: All Controls of the same type are together in the Intellisense - same with all Variables of the same type if you use a prefix!

So .NET style-guide is a most time a good thing but on HN i TOTALLY DISAGREE! I think what is really important in programming is a very strict style you use for yourself. This helps a lot to decide what variable names to use. I have no problem thinking about knew "names" after a type conversion or something similar. And it's easier to spot wrong conversions (iVariable = fVariable)
Think about it ...

string strError;
int iError;
Exception exError;

string strLine;
string[] astrLine = strLine.Split(",".ToCharArray());

Poke tongue | ;-P
GeneralRe: Hungarian Notation Discussion 1000th Pin
#realJSOP21-Dec-08 23:08
mve#realJSOP21-Dec-08 23:08 
GeneralRe: Hungarian Notation Discussion 1000th Pin
johannesnestler21-Dec-08 23:13
johannesnestler21-Dec-08 23:13 
GeneralYou don't need to write this. The framework has most of these features already. Pin
dave.dolan7-Nov-08 10:19
dave.dolan7-Nov-08 10:19 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
#realJSOP7-Nov-08 11:12
mve#realJSOP7-Nov-08 11:12 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
dave.dolan7-Nov-08 14:56
dave.dolan7-Nov-08 14:56 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
#realJSOP7-Nov-08 16:02
mve#realJSOP7-Nov-08 16:02 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
#realJSOP20-Dec-08 23:22
mve#realJSOP20-Dec-08 23:22 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
#realJSOP21-Dec-08 2:15
mve#realJSOP21-Dec-08 2:15 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
tonyt23-Dec-08 18:56
tonyt23-Dec-08 18:56 
GeneralRe: You don't need to write this. The framework has most of these features already. Pin
#realJSOP23-Dec-08 23:36
mve#realJSOP23-Dec-08 23:36 
GeneralUpdate Posted Pin
#realJSOP31-Jul-07 2:33
mve#realJSOP31-Jul-07 2:33 
GeneralAn Update is Forthcoming Pin
#realJSOP30-Jul-07 2:11
mve#realJSOP30-Jul-07 2:11 
GeneralImprovements Pin
Steve Hansen30-Jul-07 1:20
Steve Hansen30-Jul-07 1:20 
GeneralRe: Improvements Pin
#realJSOP30-Jul-07 1:43
mve#realJSOP30-Jul-07 1:43 
GeneralRe: Improvements Pin
Nish Nishant30-Jul-07 1:56
sitebuilderNish Nishant30-Jul-07 1:56 
GeneralRe: Improvements Pin
#realJSOP30-Jul-07 2:54
mve#realJSOP30-Jul-07 2:54 
GeneralRe: Improvements Pin
Todd Smith30-Jul-07 6:14
Todd Smith30-Jul-07 6:14 
GeneralRe: Improvements Pin
#realJSOP31-Jul-07 5:05
mve#realJSOP31-Jul-07 5:05 
GeneralRe: Improvements Pin
elektrowolf7-Nov-08 10:17
elektrowolf7-Nov-08 10:17 
GeneralRe: Improvements Pin
#realJSOP7-Nov-08 11:15
mve#realJSOP7-Nov-08 11:15 
GeneralRe: Improvements Pin
lpt21-Dec-08 5:46
lpt21-Dec-08 5:46 
GeneralRe: Improvements Pin
Steve Hansen30-Jul-07 2:03
Steve Hansen30-Jul-07 2:03 
GeneralRe: Improvements Pin
#realJSOP30-Jul-07 2:09
mve#realJSOP30-Jul-07 2:09 
AnswerRe: Improvements Pin
torial30-Jul-07 3:19
torial30-Jul-07 3:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.