Click here to Skip to main content
6,595,444 members and growing! (20,049 online)
Email Password   helpLost your password?
General Programming » Algorithms & Recipes » Parsers and Interpreters     Intermediate License: A Public Domain dedication

Persistent String Parser

By John Simmons / outlaw programmer

Parse a string with quoted elements, insert/add/delete elements, and is CLS compliant
C# 2.0, Windows, .NET 2.0VS2005, Dev
Posted:30 Jul 2007
Updated:21 Dec 2008
Views:23,546
Bookmarked:36 times
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
19 votes for this article.
Popularity: 5.09 Rating: 3.98 out of 5
4 votes, 22.2%
1

2
3 votes, 16.7%
3
3 votes, 16.7%
4
8 votes, 44.4%
5
Screenshot - qstringparser.png

Introduction

Well, I'm finally dragging myself kicking and screaming into the world of C# and .Net. I've decided that in order to stay employable, I have to move away from C++. Toward that end, I recently accepted a C# programming position, and since I didn't want to start a new job cold, I figured I'd get some additional C# experience by porting one or more of my C++/MFC articles to the .Net platform.

Update 21 Dec 2008: A recent comment about this code caused me to revisit it and resulted in a substantial rewrite of the class in question. This article and the downloadable code have been updated to reflect these changes.

What the StringParser Class Is

Essentially, the StringParser class accepts a delimited string, breaks it down into its delimited elements according to the specified delimiter string (or character), and stores it in a string list. You can also add to, insert into, or delete fro the list of parsed elements. Further it supports quoted strings. While you might be inclined to say say that the string.Split() method does the same thing, I agreee - to a point. It does indeed allow you to split a string using a character or a string as a delimiter, but it doesn't support quoted elements, nor does it (readily) provide a method for changing the list of elements.

Personally, I don't see much use for a delimiter string, but someone requested such a feature in my original MFC/VC6 article, and the string.Split() method provides the functonality, so I figured I'd give it a shot in this version of the class. Another new feature (over and above what was provide din the C++ version of the article) is that when deleting a field, the programmer has the choice of deleting all instances of a given string, the first instance, or the last instance.

In the original version of this article, I put the class into a class library project because it seemed to be the right thing to do, but for the most recent version, I put it into the executing assembly just to keep things simple.

One item of note - this class is CLS compliant, so it can be put into its own asembly and used by any language supported by dotNet.

The StringParser Class

The class itself isn't really that fancy, and can essentially be considered a wrapper around the string.Split() method. The real action happens in the Parse() method:

protected int ParseString(string text)
{
    // split the string according to the specified delimiter
    string[] parts = text.Split(m_delimiter.ToCharArray(), StringSplitOptions.None);
    string combinedParts = "";
    bool quoted = false;

    // add each item to the list
    foreach (string part in parts)
    {
        // if the current part contains a quote character, we're building a 
	// multi-part list item, so turn on our tracker
	if (CharacterCount(m_quoteChar, part) == 1)
	{
	    quoted = (quoted) ? false : true;
	}

        // add the part to the combined word
	combinedParts += part;

        // if we're working on a quoted string, add a delimiter character to the 
        // end of the current combined parts.
        if (quoted)
        {    
            combinedParts += m_delimiter;
        }

        // if we're no longer building a combined part, add the current combined 
        // part to the list of elements, and reset the combinedPart variable.
        if (!quoted)
        {
            m_elementList.Add(combinedParts);
            combinedParts = "";
        }
    }

    return m_elementList.Count;
}

This version of the method is MUCH shorter than the original one because dave.dolan recently informed me of the fact that the string.Split() method supports delimiter strings. This in turn caused me to re-evaluate the code, and I came up with a much lower-impact version and fixed a quote string bug in the process. In a nutshell, the method splits the string according to the specified delimiter character, and the adds the individual parts to the internally maintained list. If the string is a quoted string (and if the StringParser object was told to handle quoted strings), the previously split parts are re-combined to for the quoted string.

Note: In the interest of maintaining an accurate historical record, I left the original download in this article, and added a new downoad with the much-improved code. The original code is an embarassment, but I think it's important for new programmers to see how important it is to become as familiar as possible with the dotNet framework as possible.

The Sample Application

The sample application took three times longer to write than the actual class (again, due entirely to my lack of exposure to the .Net framework. There's no real validation being done, so if you experience any quirks, let me know and I'll try to post updated code. Better yet, be a programmer, fix them yourself, and then posrt your findings here.

Class Reference

namespace StringParserLib
{
    public enum eDeleteAction { DeleteAll, DeleteFirst, DeleteLast };

    /// <summary>
    ///  Persistent string parser by Paddedwall Software. 
    /// </summary>
    public class StringParser
    {
        // Properties
        public List Elements
        public string Delimiter
        public char QuoteCharacter


	// Constructors
        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you don't yet know the string to be parsed. 
        /// It is assumed that the delimiter will be a comma, and the string is not 
        /// quoted.
        /// </summary>
        public StringParser()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// you want to use the default delimiter of ",".
        /// </summary>
        /// <param name="value">The string to parse</param>
        public StringParser(string value)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// you only need to specify the delimiter for a non-quoted string.
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter string</param>
        public StringParser(string value, string delimiter)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Constructor - use this one when you DO know the string to be parsed, and 
        /// want to specify the delimiter AND the quote character.
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter to use</param>
        /// <param name="quote">The quote character to look for (only needed if the 
        /// string contains quote characters that you want to honor)</param>
        public StringParser(string value, string delimiter, char quote)


        // Protected Methods

       //--------------------------------------------------------------------------------
        /// <summary>
        /// Initializes the data members and allocates the string array
        /// </summary>
        protected void ProtectedInit()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Removes all of the items from the string array
        /// </summary>
        protected void Clear()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Returns true if the specified index is a valid index into the array.
        /// </summary>
        /// <param name="index">The index to check</param>
        /// <returns>True if the specified index is within the valid range</returns>
        protected bool IndexInRange(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Does the actual parsing work
        /// </summary>
        /// <param name="character">The charecter to look for</param>
        /// <returns>The number of fields parsed</returns>
        protected int ParseString(string text)


	// Public Methods

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Clears the list, resets the delimiter and quote character, and the 
        /// re-parses the string
        /// </summary>
        /// <param name="value">The string to parse</param>
        /// <param name="delimiter">The delimiter to use</param>
        /// <param name="quote">The quote character to look for (only needed if the 
        /// string contains quote characters that you want to honor)</param>
        /// <returns>The number of fields parsed</returns>
        public int ResetOriginalString(string text, string delimiter, char quote)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Rebuilds the original string by cycling through the array list.
        /// </summary>
        /// <returns>The original string used to create the element list</returns>
        public string ReassembleOriginalString()

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Retrieves and returns the string at the specified index
        /// </summary>
        /// <param name="index">The index of the desired field</param>
        /// <returns>The field at the specified index</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public string GetField(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Retrieves the string at the specified index, and strips the specified string
        /// before returning it to the calling function.  This method is useful for 
        /// removing quotes.
        /// </summary>
        /// <param name="index">The index of the desired field</param>
        /// <param name="delimiter">The text to remove from the returned value</param>
        /// <returns>The field at the specified index</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public string GetField(int index, string textToRemove)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string regardless of case.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int Find(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string regardless of case.
        /// </summary>
        /// <param name="startingIndex">The index at which to start the search</param>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int Find(int startingIndex, string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string - case sensitive.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindExact(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the index of the desired string - case sensitive.
        /// </summary>
        /// <param name="startingIndex">The index at which to start the search</param>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int FindExact(int startingIndex, string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the string regardless of case, starting at the last element.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindReverse(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Finds the exact string, but starting at the last element - case-sensitive.
        /// </summary>
        /// <param name="textToFind">The text we're looking for</param>
        /// <returns>The index of the desired field</returns>
        public int FindReverseExact(string textToFind)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Adds a new string to the array and resets the origial string to include the
        /// new field. It's okay if the string is empty.
        /// </summary>
        /// <param name="textToAdd">The text we want to add</param>
        public void AddField(string textToAdd)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Insert the specified string at the specified inded in the array list. Since
        /// exceptions are pointless here, we'll normalize the specified index so that this
        /// function doesn't insert the string at element 0, or if oustide the current
        /// range, simply adds it to the end of the array.  The return value indicates
        /// where in the array, the item was inserted or added.  If the return value isn't
        /// the same as the specified index, the calling function passed in an invalid
        /// value.
        /// </summary>
        /// <param name="index">The index at which to insert the new string</param>
        /// <param name="textToInsert">The text to insert</param>
        /// <returns>The index at which the string was inserted</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public int InsertField(int index, string textToInsert)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Deletes the specified array list element, and returns true if the deletion was 
        /// successful.
        /// </summary>
        /// <param name="index">The index to delete</param>
        /// <returns>True if the item was deleted</returns>
        /// <exception cref="IndexOutOfRangeException"></exception>
        public bool DeleteField(int index)

        //--------------------------------------------------------------------------------
        /// <summary>
        /// Deletes the specified array list element, and returns true if the deletion 
        /// was successful. 
        /// </summary>
        /// <param name="textToDelete">The index string to delete</param>
        /// <param name="exactMatch">The index at which to insert the new string</param>
        /// <param name="deleteAction">How to delete the item (first occurance, last 
        /// occurance, or all occurrances)</param>
        /// <returns>True if the item was deleted</returns>
        public bool DeleteField(string textToDelete, bool exactMatch, DeleteAction deleteAction)

    } 
}

As you can see, the class is fairly straightforward, and using it is a simple matter for even the most novice of .Net programmers (and don't forget - it's CLS compliant).

Change History

21 Dec 2008 (update inspired by comment from CP user dave.dolan - Thanks Dave!)

  • Significant change to StringParse method. It's much shorter, and much more reliable.
  • Proper XML comments added.
  • The original string is no longer stored in the object, so all GetField calls are now zero-based (the old version was one-based).
  • Added new constructor overloads.
  • Added new Find and FindExact overloads that let you start the search at a specified index.
  • Added proper exception handling.

31 Jul 2007 - (based on comments submitted after initial article upload)

  • Changed class to use .Net naming convention
  • Renamed namespace and class to comply with first change describe above
  • Eliminated hungarian notation (I feel somehow dirty as a result)
  • Changed from using ArrayList to generic List
  • Changed function comments to Intellisense-compatible summary tags
  • Moved enum declaration outside the class
  • Changed demo app to use .Net naming convention, including control names.
  • Changed demo app to provide basic validation.

30 Jul 2007

  • Original article.

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

About the Author

John Simmons / outlaw programmer


Member
I've been paid as a programmer since 1982 with experience in Pascal, and C++ (both self-taught). I've been writing Windows programs since 1991 almost exclusively with Visual C++ and MFC. In the 2nd half of 2007, I started writing C# desktop and web applications.

My weakest point is that my moments of clarity are too brief to hold a meaningful conversation that requires more than 30 seconds to complete. Thankfully, grunts of agreement are all that is required to conduct most discussions without committing to any particular belief system.

I really don't care if you vote 1's on my forum posts, but at least act like a professional when it comes to my articles. When you play stupid little voting games because you don't agree with someone's politics or sense of humor, you're cheating all the other members of the site.
Occupation: Software Developer (Senior)
Location: United States United States

Other popular Algorithms & Recipes articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 25 of 35 (Total in Forum: 35) (Refresh)FirstPrevNext
GeneralHungarian Notation Discussion 1000th Pinmemberjohannesnestler23:39 21 Dec '08  
GeneralRe: Hungarian Notation Discussion 1000th PinmvpJohn Simmons / outlaw programmer0:08 22 Dec '08  
GeneralRe: Hungarian Notation Discussion 1000th Pinmemberjohannesnestler0:13 22 Dec '08  
GeneralYou don't need to write this. The framework has most of these features already. Pinmemberdave.dolan11:19 7 Nov '08  
GeneralRe: You don't need to write this. The framework has most of these features already. PinmvpJohn Simmons / outlaw programmer12:12 7 Nov '08  
GeneralRe: You don't need to write this. The framework has most of these features already. Pinmemberdave.dolan15:56 7 Nov '08  
GeneralRe: You don't need to write this. The framework has most of these features already. PinmvpJohn Simmons / outlaw programmer17:02 7 Nov '08  
GeneralRe: You don't need to write this. The framework has most of these features already. PinmvpJohn Simmons / outlaw programmer0:22 21 Dec '08  
GeneralRe: You don't need to write this. The framework has most of these features already. PinmvpJohn Simmons / outlaw programmer3:15 21 Dec '08  
GeneralRe: You don't need to write this. The framework has most of these features already. Pinmembertonyt19:56 23 Dec '08  
GeneralRe: You don't need to write this. The framework has most of these features already. PinmvpJohn Simmons / outlaw programmer0:36 24 Dec '08  
GeneralUpdate Posted PinmvpJohn Simmons / outlaw programmer3:33 31 Jul '07  
GeneralAn Update is Forthcoming PinmvpJohn Simmons / outlaw programmer3:11 30 Jul '07  
GeneralImprovements PinmemberSteve Hansen2:20 30 Jul '07  
GeneralRe: Improvements PinmvpJohn Simmons / outlaw programmer2:43 30 Jul '07  
GeneralRe: Improvements PinmvpNishant Sivakumar2:56 30 Jul '07  
GeneralRe: Improvements PinmvpJohn Simmons / outlaw programmer3:54 30 Jul '07  
GeneralRe: Improvements PinmemberTodd Smith7:14 30 Jul '07  
GeneralRe: Improvements PinmvpJohn Simmons / outlaw programmer6:05 31 Jul '07  
GeneralRe: Improvements Pinmemberelektrowolf11:17 7 Nov '08  
GeneralRe: Improvements PinmvpJohn Simmons / outlaw programmer12:15 7 Nov '08  
GeneralRe: Improvements Pinmemberlpt6:46 21 Dec '08  
GeneralRe: Improvements PinmemberSteve Hansen3:03 30 Jul '07  
GeneralRe: Improvements PinmvpJohn Simmons / outlaw programmer3:09 30 Jul '07  
AnswerRe: Improvements Pinmembertorial4:19 30 Jul '07  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 21 Dec 2008
Editor: John Simmons / outlaw programmer
Copyright 2007 by John Simmons / outlaw programmer
Everything else Copyright © CodeProject, 1999-2009
Web17 | Advertise on the Code Project