Click here to Skip to main content
Click here to Skip to main content

StringTokenizer Library

, 23 Apr 2006 CPOL
Rate this:
Please Sign up or sign in to vote.
Yet another C# implementation of Java's StringTokenizer (in a ready to use library).

Introduction

I've been using CodeProject for some time now, and I thought that I might write something instead of just reading. And since I needed a String Tokenizer class (importing a big Java project), I thought I'll write one myself, and here it is - my first article. Do be gentle Smile | :)

The StringToke... what?

This class was designed not just for people who know Java, so a quick introduction is required. If you know what this is, feel free to skip to the next paragraph.

Let's say you have a string, for example, "one, two, three, four", and you want to easily extract and use all those numbers separately. What this class offers is a simple interface to do that. You can specify the characters which will be used to 'cut' the string yourself (called delimiters), or use the default set, which is " \t\n\r\f":

  • the space character
  • the tab character
  • the newline character
  • the carriage-return character
  • the form-feed character

After the tokenization, you simply use the NextToken property to obtain the next token. Upon using this property, a private index of the current token position is incremented so that the next token can be obtained. You should always check if the next token exists before you try to extract it using the HasNextToken property. You can also ask for the delimiters to be extracted as tokens and to return empty tokens (+ you can specify your own string for an empty token, such as "MISSING" or simply null). Please see the example for details.

Documentation

I've included (in the ./doc/ folder of the Zip file) an HTML documentation. It was generated with NDoc, hence the case-sensitivity problems (i.e., NDoc redirects all properties' links into the equivalent methods, and for example, clicking on 'NextToken' results in going to the 'nextToken()' page). I don't know how to fix this, so I'm just waiting for a new release. If someone does, please tell me and I'll regenerate the documentation. The code itself is also XML-commented, so Visual Studio will provide on-the-fly support while using the class.

Nevertheless, this is a very self-explanatory class, and if you don't need the details, just start using it Smile | :)

About the implementation

This is just another C# version of the java.util.StringTokenizer class. Basically, it's a wrapper class around the String.Split method. It implements all of its Java equivalent methods apart from those only needed by the Enumeration interface. All implemented Java-compliant methods have their C# equivalents in properties. The example will clarify this later. Basically, this implementation includes:

  • Java's methods 'as is' (preserving the exact names for compatibility) which are just aliases for
  • C# properties named exactly as the Java-compatible methods with the first letter capitalized (Camel-case)

Do please remember this subtle difference: Each Java methodName() method has (and uses) its equivalent property MethodName.

The IEnumerable interface has been implemented so you can iterate through an instance of the class using the foreach loop. Doing this increments the internal current position index (just like using NextToken or nextToken()), so remember to invoke the Reset() method to re-read the tokens. Or, use the indexer (it doesn't increment the index and can be used at any time).

This StringTokenizer class is a member of the StringTools namespace.

Public methods, properties, and a constant

All the methods do (I hope* Smile | :) what their Java relatives, so I'll just describe the new things:

  • StringTokenizer(string, params char[]) constructor - gives a way to specify the delimiters one by one without the need to stringify them.
  • StringTokenizer(string, string, bool, bool returnEmpty) constructor - gives a way to ask the class to return the tokens which are empty using the default String.Empty string.
  • StringTokenizer(string, string, bool, bool, string empty) constructor - gives a way to specify the string to be returned instead of the default String.Empty string.
  • void Reset() method - resets the current position so that the tokens can be extracted again.
  • string DefaultDelimiters constant - holds the default set of delimiters.
  • int Count property - returns the total number of tokens extracted from the tokenized string.
  • string this[int] indexer - returns the token at the specified index.
  • string EmptyString property - returns the string used for empty tokens.

*I've tested it in a variety of ways, and it seems to work like it should, though if you find a bug, please let me know.

The long awaited example

Example usage of the class:

Don't forget to include: using StringTools;.

string str = "One, two, three";
Console.WriteLine("The string to be tokenized: [{0}]", str);

StringTokenizer st = new StringTokenizer(str, ",");
Console.WriteLine("\nThe Java way + comma tokenization:");
while (st.hasMoreTokens()) // == st.HasMoreTokens
    Console.WriteLine("[{0}]", st.nextToken()); // == st.NextToken

Console.WriteLine("\nThe C# way + comma tokenization");
st.Reset();// Not available in Java - after this we can reget the tokens
foreach(string token in st)
    Console.WriteLine("[{0}]", token);

Console.WriteLine("\nThe other C# way + tokenize using \", \" + return tokens");
Console.WriteLine("Uses the indexer to get tokens - doesn't " + 
                  "increment the 'current position'");
st = new StringTokenizer(str, " ,", true);
for (int i = 0; i<st.Count; i++)
    Console.WriteLine("Tokens left:{2}, token number {0} is [{1}]", 
                      i.ToString(), st[i], st.CountTokens.ToString());

string database = "John|Smith|46|5550000|||john@internet.com|";
Console.WriteLine("\nSample database tokenization for database line:\n[{0}]\n", database);
st = new StringTokenizer(database, "|", false, true, "MISSING DATA");
foreach (string token in st)
    Console.WriteLine("[{0}]", token);

This outputs:

The string to be tokenized: [One, two, three]

The Java way + comma tokenization:
[One]
[ two]
[ three]

The C# way + comma tokenization
[One]
[ two]
[ three]

The other C# way + tokenize using ", " + return tokens
Uses the indexer to get tokens - doesn't increment the 'current position'
Tokens left:7, token number 0 is [One]
Tokens left:7, token number 1 is [,]
Tokens left:7, token number 2 is [ ]
Tokens left:7, token number 3 is [two]
Tokens left:7, token number 4 is [,]
Tokens left:7, token number 5 is [ ]
Tokens left:7, token number 6 is [three]

Sample database tokenization for database line:
[John|Smith|46|5550000|||john@internet.com|]

[John]
[Smith]
[46]
[5550000]
[MISSING DATA]
[MISSING DATA]
[john@internet.com]
[MISSING DATA]

Requirements

.NET Framework 2.0.

Usage

Just include the StringTokenizer.dll library in your code and you're done! Or you can also include the whole 'raw' StringTokenizer.cs code file into your project.

And don't forget using StringTools;.

Credits

I'd like to thank:

  • M.Lansdaal for proving that empty tokens are useful and that using Microsoft namespaces isn't a good practice
  • paillave for pointing out that I forgot to implement the IEnuberable interface

History

  • 24.04.2006 - Publication
  • 24.04.2006 - Minor changes
  • 24.04.2006 - Second release
  • Added returning empty tokens, implemented IEnumerable, changed the namespace, included projects in the .Zip file (not just source code), and documentation in HTML format instead of .CHM.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

QmQ

Poland Poland
No Biography provided

Comments and Discussions

 
GeneralMy vote of 5 Pinmemberricardmag6-Feb-13 12:53 
GeneralMy vote of 4 PingroupMaheesha_20114-Feb-12 17:00 
Generalinteresting but... Pinmemberpaillave25-Apr-06 2:49 
GeneralRe: interesting but... PinmemberQmQ25-Apr-06 10:55 
GeneralComment about tokenization PinmemberM.Lansdaal24-Apr-06 12:10 
GeneralRe: Comment about tokenization PinmemberQmQ24-Apr-06 12:58 
GeneralRe: Comment about tokenization PinmemberM.Lansdaal24-Apr-06 13:21 
GeneralRe: Comment about tokenization PinmemberQmQ24-Apr-06 13:52 
GeneralRe: Comment about tokenization PinmemberM.Lansdaal24-Apr-06 14:04 
GeneralRe: Comment about tokenization PinmemberQmQ24-Apr-06 14:28 
GeneralRe: Comment about tokenization PinmemberM.Lansdaal24-Apr-06 17:45 
GeneralRe: Comment about tokenization PinmemberQmQ25-Apr-06 10:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150327.1 | Last Updated 23 Apr 2006
Article Copyright 2006 by QmQ
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid