Click here to Skip to main content
15,885,141 members
Articles / Programming Languages / C# 4.0

CmdTailCmd - Yet Another Tail Parser

Rate me:
Please Sign up or sign in to vote.
4.67/5 (3 votes)
22 Apr 2011Ms-PL15 min read 16.4K   173   11   1
A simple library and technique to turn command tails into (Design Patterns) Command objects.

Introduction

CmdTailCmd is a .NET library to parse a program's command tail and fill in public fields on an object. It is intended to initialize objects implementing some variant of the Command pattern from the Gang of Four book Design Patterns, although that isn't necessary. I expect most programs will want to incorporate the files directly rather than use it as a standalone assembly.

Programmers have only been handling command line parameters for 40 years or so. As with anything that is new, there are no well-known techniques or libraries that work for most applications. Or rather, as with anything that old, there are no well-known techniques or libraries that work for most applications: the venerable libraries don't always match modern tools and needs, and the new libraries haven't shaken out to a standard.

CmdTailCmd is a small and (very) simple-to-use library.

  1. Define a command object that describes an operation your program does.
    1. This object should contain a public field for each command line option you want to detect for this operation.
    2. You may also add methods for validation and execution, making this object fairly standalone.
  2. Initialize a library object with your command tail (the args parameter to your Main function).
  3. As the command tail object to fill out your operation object:
    1. The library will examine command line parameters, coerce types, call custom converters if you have any, read enumerations, handle case (in)sensitivity, and support (or prohibit) using substrings and alternate names.
    2. The library will return your object along with any errors, any unidentified parameters, and a copy of the command line for your examination.
  4. Look at your filled-in object. If it's valid, execute. If not, try a different command or show help.

As for help, the library object will also recognize common idioms for requesting help and can recognize which category of help was requested from your application-specific list.

Background

I write command line tools a lot, often as quick tests when building components. Command line handling is a nightmare of inconsistent traditions, odd special cases, and competing custom libraries with their own opinions and restrictions. Scripting environments set up much clearer rules than an arbitrary shell app enjoys, which can make scripting more appealing for some tasks, but sometimes you need to write an app.

I made do with the quite reasonable NConsoler library for a regex tool I needed last year. When I went to modify my tool, I realized that the assumptions in that library don't match my thinking; I had adapted how I wrote my code and how I used the app to match NConsoler's design. The design is reasonable, but I'd rather restrict my code as little as I can get away with.

When choosing a library, designing the command line parameters an app will accept, or writing code to parse the command tail, there are a lot more questions than are immediately obvious. For example:

  1. Whether parameter order is significant.
  2. What data types are allowed for a parameter.
  3. Common types include string, integer, boolean, and date. You may also see string arrays or selections from a hard-coded list of strings (enumerations).

  4. What switch characters to allow.
  5. Fortunately, '/' and '-' are all you usually need to worry about on Windows platforms, and only '-' on Unix-derived systems.

  6. What parameters can be included without any switch character or name ("dir just_a_name.txt").
  7. Often the first or last undecorated token has some special meaning, such as input filename.

  8. How parameter values are demarked.
  9. Common choices include "/file filename", "/file=filename", and "/file:filename"".

  10. How boolean values are demarked.
  11. Common choices include "/b" (implicit true), "/b+", "/b true" for true. or "/b-" and "/-b" for false.

  12. How (and if) boolean values may be combined.
  13. For example, "app /sb" for "app /s /b" and how that relates to "false" values ("app /sb-", "app /s-b", etc.).

  14. What special cases to support.
  15. "-" is often used to represent stdin. This needs special handling since it can run afoul of switchchar detection.

  16. What special idioms to support.
  17. Commonly, no parameters, "app /?", "app -h", "app --help", or "app help" are used for help. Also. "app help some_specific_help" is often used for detailed help. Commonly, "--" as a special switchchar means some parameter applying to the context or environment of the operation rather than the operation itself.

  18. Whether to allow unambiguous substrings as parameters ("/file" for "/file-to-encode").
  19. Whether to allow aliases for parameters ("/rtl" for "/right-to-left").
  20. And, specifically for libraries:
    1. When and how to show what errors (especially, should the library do that for you, should it format the error text, ...).
    2. Should/can the library generate help text.
    3. How you tell the library what parameters to look for, their types, and their semantics.

    Many libraries ask for detailed structures or for attributed objects. Others answer requests for specific values from the tail rather than returning everything at once.

Every answer to each of these questions has merits in some given situation. As a community, our needs and expectations have changed through the decades. It isn't surprising that 40 years hasn't been enough to standardize this in the most general case; instead, it's been long enough to outdate standards.

The answers to these questions drive some interesting design issues in the parsing. For example, if implicit "true" is allowed for booleans using the form "/b" and if detached values are allowed for strings using the form "/detached value"--both very common decisions--it is impossible to determine if "/param text" is a boolean "param" being set to an implicit "true" or a string "param" being set to "text", without knowing the expected types.

Using the code

No details here, just examples for common cases to consume the library. All are for some made-up logging tool; please don't pick at the example app, it leaves a nasty scab.

Case 1: Nothing fancy

C#
// This is here to show enum handling
public enum LogOutputFormat {Raw, CSV, TabDelimited, XML, compressed, csv2}
 
// The command object to fill in. In this case, it's just a data bag
// On the command object, the library fills in any public fields from the command line,
// matching the name of the field and trying to coerce the type of any value
public class MyActionParams
{
    public string           FileName;
    public bool             Append;
    public LogOutputFormat  OutputFormat;
    public DateTime         Touch;
    public int              Delay;
}

public static Main(string[] args)
{
    // Get a library object ready to handle these parameters and call it
    // The library takes an instance of the command object so you have
    // a chance to set default values for fields not found on the 
    // command line
    CommandTail tail = new CommandTail(args);
    CmdSettings<MyActionParams> Action = tail.Parse(new MyActionParams());

    // The returned object, Action, has these members:
    // Action.Settings                     a MyActionParams with values filled in
    // Action.Context.ParsingExceptions    any exception found
    // Action.Context.AllTokens            the original command line tokens (args)
    // Action.Context.UndecoratedTokens    any bare tokens that weren't
    //                                     attached to a field in MyParams
}

We can feed the program these command lines:

  • app /FileName=somefile.log /Append /OutputFormat:TabDelimited -Touch 2011.04.01
  • Delay is still default(int) (zero).

  • app /Append /Append- /Append true /Append:false /Append+--+-+
  • Each syntax works for assigning to Append. The last one on the command line (true from the final + in the last parameter) wins.

  • app /Out:Tab -File somefile.log
  • By default, unambiguous substrings can be used for property names and for enumeration values, so "/Out" can be used instead of "/OutputFormat".

  • app /filename somefile.log /outputformat raw
  • By default, names of settings ("filename" for "FileName") and enumeration values ("raw" for "Raw") are case insensitive--unless it introduces ambiguity.

  • app /OutputFormat=c
  • Generates an error. "c" is ambiguous. It could mean "CSV" or "csv2".

  • app /out CSV
  • This isn't ambiguous since it matches one of the enumeration values exactly.

  • app unknown_string /Append+ unknown_string_2
  • "unknown_string" isn't part of a recognized tag, so it appears in the "UndecoratedTokens" collection on the CmdSettings object's Context property. "unknown_string_2" also appears in UndecoratedTokens, since the token before it is complete.

  • app /Append unknown_string_3
  • The undecorated tokens collection is empty. Since /Append is ambiguous (it could be an implicit true or it could be expecting a value in the next token), the next token is examined for a switchchar. Since it doesn't have one, the literal "undecorated_string_3" is coerced to a boolean and Append is set to false.

Case 2: Some finer control

Let's take some more control using attributes. Individual settings support these attributed options:

  • These apply to any field on your object:
    • NameCaseSensitive - The name of the parameter is case sensitive (default)
    • NameAllowSubstring - The name of the parameter can be shortened to an unambiguous initial substring (default)
    • AlternateNames - An array of alternate names the user may supply for a parameter (such as "/rtl" for "/right-to-left")
  • These apply to enumerations:
    • ValueCaseSensitive - The value of the parameter is case sensitive
    • ValueAllowSubstring - The value of the parameter can be shortened to an unambiguous initial substring

Example:

C#
public class MyParams
{
    // FileName can also be set as /Name or /LogFile
    [CmdTailSetting(AlternateNames = new string[]{"Name", "Logfile"})]
    public string           FileName;

    // You cannot set this using /A or /App or any other substring, but you
    // can set it with /append or ApPeNd or whatever casing you desire
    [CmdTailSetting(NameAllowSubstring = false)]
    public bool             Append;
    
    // You cannot set the value unless you get the case right. You can't use
    // /OutputFormat:TABDELIMITED or /OutputFormat=CSV2 
    // You *can* use properly cased substrings, such as /OutputFormat=Tab 
    // The name of the field (OutputFormat) is *not* case sensitive
    // Substrings are allowed for name and value: /out=TabD
    [CmdTailSetting(ValueCaseSensitive = true)]
    public LogOutputFormat  OutputFormat;
    
    // These are simple, unattributed fields. They use the "normal" rules
    public DateTime         Touch;
    public int              Delay;
}

Calling it:

  • app /name logfile.log
  • You can use "/Name" or "/Logfile" (or any substring of those) instead of "/FileName"

  • app -Appe false
  • Error. Append does not allow substrings. "Appe" is an unknown parameter and an exception is put in the ParsingExceptions collection and optionally thrown.

    Since the type of "Appe" is unknown, "false" is considered an undecorated token.

  • app /out xml
  • Error. Case is significant even when unambiguous for the value of OutputFormat (but not for the name OutputFormat itself).

Case 3: Let's get smarter

Now that we know how to parse the parameters into a structure, let's see how to handle several different commands:

Let's say we want to allow these command lines:

  • app /Dump /FileName filename [/append+-]
  • app /Roll /FileName filename /ArchiveDir archive_dir [/OlderThan cutoff_date]
C#
// First, a list of the operations we want to do
public class LogAppCommands : CmdTailCommand
{
    // This is a trick. It allows "/dump" to indicate to use the dump action, etc.
    // We put them in a base class so command classes can test if other commands were 
    // specified, should they want to
    public bool Dump = false;
    public bool Roll = false;
}

// The interface ICmdTailCommand has some standardized methods to create a 
// simple command object for each operation.
// With this type of construction, you get one object that has its data and 
// whatever execution you want. Inside that object, you don't think about
// how the values got there.
public class DumpCommand : LogAppCommands, ICmdTailCommand
{
    [CmdTailSetting(AlternateNames = new string[]{"Log", "Output"})]
    public string FileName;
    public bool   Append = true;

    public bool IsValid(CmdTailContext ctx)
    {
        // We need a filename and they need to have specified /dump but not /roll
        return (FileName != null && FileName != string.Empty && Dump && !Roll);
    }

    public bool Execute(object o)
    {
        // Do whatever you need... 
    }
}
public class RollCommand : LogAppCommands, ICmdTailCommand
{
    [CmdTailSetting(AlternateNames = new string[]{"Log"})]
    public string           FileName;
    public DateTime         OlderThan = DateTime.Now;
    public DirectoryInfo    ArchiveDirectory;

    // Implement IsValid and Execute....
}

// Inside Main...
    CommandTail tail = new CommandTail(args);
    
    // Now to get really, really fancy.
    // We have a field of type DirectoryInfo, but that isn't handled
    // explicitly. It might work (the library
    // tries to find a static Parse(string) method, which works
    // for many types), but let's make sure we handle it the way we want to. 
    //  To do this, we add a type coercer, which is just a Func<string, object> 
    // Once the CommandTail has this,
    // any field of that type will try to convert with the Func
    tail.AddCoercer(typeof(DateTime), (s) => new DirectoryInfo(s));

    CmdSettings<DumpCommand> DumpParams = tail.Parse<DumpCommand>();
    CmdSettings<RollCommand> RollParams = tail.Parse<RollCommand>();
    
    if      (DumpParams.Settings.IsValid()) {DumpParams.Settings.Execute();}
    else if (RollParams.Settings.IsValid()) {RollParams.Settings.Execute();}

Case 4: Help handling

Getting help from a command line app is important and should be universal. The library has a couple of functions to encourage writing help:

  • CmdTailSettings.IsHelpRequest() tries to determine if any common idiom for help was passed.
  • CmdTailSettings.HelpRequestCategory<E> tries to match the help request to any named value (from an enumeration)

For example:

C#
// Make the default item your most general. It's returned if nothing else matches.
public enum LogToolHelpCategories {General, Version, DumpingLogs, RollingLogs, Formats}

// In your main...
    CommandTail tail = new CommandTail(args);
    
    // We have a Dictionary<LogToolHelpCategories, string> with our help text

    if (tail.IsHelpRequest())
    {
        LogToolHelpCategories cat = tail.HelpRequestCategory<LogToolHelpCategories>();
        Console.WriteLine(LogToolHelpText[cat]);
        return 0;
    }

Now the user can request help:

c:>app
c:>app /?
c:>app help
c:>app --help
c:>app help rolling

Some idioms when planning your command line handling

When using the library, I find some idioms clean up my code and my design.

Idiom 1: Put shared parameters in a base class

Programs often have multiple commands but have some parameters common to all (or most) commands. These may be metaparameters not specific to any command (e.g., a computer to connect to or whether to use UTF8 in the output), or they may be parameters common to any operation the program could support (e.g., input filename).

Create a base class which contains fields for these parameters, and inherit your command objects from it. This keeps the name, attributes, and type the same across every command object, helping keep the user from being confused.

C#
public class CommonParameters
{
    public string InputFile;
    public bool   UTF8;
}
public class ValidateFile : CommonParameters
{
    public bool ExitOnError = false;
    // ...
}
public class ImportParameters : CommonParameters
{
    public string DatabaseName;
    // ...
}

Idiom 1a: Treat disallowed parameters like shared parameters

Sometimes you want to check that a parameter was not passed, usually because it would indicate that the user is confused. Treat these like shared parameters so you can test that they have not been set. Be certain not to initialize the field in the base class, so you can test the value against null.

This only works with nullable types, most commonly strings, and with enumerations with a default value that is not valid for the user to set.

Idiom 2: Identify command mode with bools

Many programs have several modes. Imagine a media app that can validate, analyze, stream, and play a file. If you create boolean parameters for each mode and place them in a base class, you can easily identify which mode was requested, test for confused user's mixing modes, and give the user a simple command syntax.

C#
public class Modes
{
    public bool Validate    = false;
    public bool Analyze     = false;
    public bool Stream      = false;
    public bool Play        = false;
}
public class Validate : Modes, ICmdTailCommand
{
    // ...
    public bool IsValid(CmdTailContext ctx)
    {
        // They need to have specified Validate for this command
        if (!Validate) return false;
        // Confused user mixes modes. Not allowed in this app
        if (Analyze || Stream || Play) return false;
        // ...
    }
}

Now the user can use the implicit true syntax to select a mode:

app /validate ...
app /stream ...
app /p ...

Small note: default (bool) is false, so the explicit assignment in the Modes constructor is not necessary (and FxCop will yell at you for double assignment, since the compiler will stupidly construct, assign the default, and then re-assign it). It's definitely worth the explicit assignment, though. The double assignment should optimize away, but even if it doesn't, you don't know who will be maintaining this code next year; if they aren't thinking about the default (or don't know the default), they should see the value instead of risking making an incorrect assumption. Don't let bad implementation in the compiler lead you to bugs in your application.

Idiom 3: Use Partial Classes to group parameters

C# borrows Java's "put each class in one file and don't group or structure the class layout" philosophy. In general, programmers like it and it's seen as a good thing; it's a reaction against C++'s separation of class structure from class implementation, and relies on clever IDEs to create the structured class metadata when programmers need it since IDEs didn't create combined views of C++ header and implementation files.

This can make identifying the set of parameters a command uses difficult, and it can make comparing parameters between commands very difficult. Since it is important to have consistency in command elements (e.g., casing, tense, name choice), we want some clear way to visualize the parameters.

C#'s partial classes give us a good way to do that. If you make one file for your parameter layouts and declare your command objects partial, you can put all of your parameter information together and still separate your command implementation into a file with all your properties and methods.

CommandLineParameters.cs
C#
public partial class Mode1Command
{
    public string SomeParameter;
    public bool   SomeSwitch;
    // ...
}
public partial class Mode2Command
{
    public int Count = 1;
    // ...
}
Mode1.cs
C#
public partial class Mode1Command
{
    public bool IsValid(CmdTailContext ctx)
    // ...
}

Idiom 4: Disable substring matching to enable substring matching

If you have two parameters with a shared initial substring, you may want one to be easily abbreviated more than the other. Disable initial substring matching on one and the other can be abbreviated. Similarly, if one parameter is a complete initial substring of another, disabling substring matching on the shorter will allow the longer to be abbreviated but still allow setting the shorter by a complete name since exact matches take precedence over substrings.

If two parameters have a shared initial substring, you can still use AlternateNames to allow unambiguous substrings while defaulting ambiguous substrings to one of them.

C#
public class Parameters
{
    [CmdTailSetting(NameAllowSubstring = false
            AlterateNames      = new string[]        {"CountD", 
                                                      "CountDi", 
                                                      "CountDis", 
                                                      "CountDisp", 
                                                      "CountDispl", 
                                                      "CountDispla"})] 
                                         public bool   CountDisplay = true;
    [CmdTailSetting(NameAllowSubstring = false)] public int    Count = 1;
                                                 public string CounterName;
}

Now you can call the app like:

  • app /C cname: Substrings default to CounterName
  • app /Count n: Exact matches are highest precedence
  • app /CounterDisp-: AlternateNames allows this

Design

Major design goals:

  • No required command line structure (although tokens may need requirements). For example, NConsoler is a great library, but the need to have an undecorated first parameter that indicates mode wasn't working for me.
  • Access to "undecorated" tokens on the line. I really wanted one incredibly common token to be passed without a name, like the filename argument to dir.
  • Easy to change the parameters allowed. I do a lot of trial and error while coding quick tools, and I do many iterations when coding production projects. I'm likely to change the command line pretty often.
  • Easy help. Even on quick tools, I like good command line help, mostly because I never remember what to do for something I may run several times in one day but only once a month.
  • Enumerations. I had hacked this into the NConsoler library for my regx tool and I can't really live without it now.

Things I didn't care about for this:

  • Speed. You parse the command line once at the start of a run. If it takes a few extra milliseconds, so be it. If you have a tool that is called extremely often, or if it is running in a restricted environment, it may not be the best choice. For general tools on a basic Windows box, optimization for speed would be ridiculous.
  • Unused generalization. It's probably got more than I need in it, but I tried hard to cut back on what I wasn't using.

Some current limitations/NYI/ideas for extension:

  • It can't support "-" as a value, which is annoying at times.
  • You can't specify negative numbers as detached tokens ("/n -5").
  • Parameter names that start with "--" can be supported with the AlternateNames attribute, but that is far from ideal.
  • Doesn't support /-W for false.
  • You can't isolate parameters and reuse them across different command objects, which some programs do ("app /command1 /file f1 /command2 /file f2", etc.).
  • Some type coercions may succeed when you'd rather they failed.
  • You're on your own for mapping where undecorated tokens fall in the token sequence, if that's important ("/files f1 f2 f3").
  • Supplying a value to the same field several times is quietly hidden, with each assignment happening as the command line is parsed. You can't detect it if it's an error for you.
  • No arbitrary numbered parameters ("/p1 f1 /p2 f2 ..." where p1 and p2 are not fields, but represent some array; this is sometimes used to allow the user to specify an arbitrary number of values).
  • It doesn't generate usage text.

Implementation

There are two obvious ways to structure the parsing:

  1. Examine the output and search the command line for matching tokens, or
  2. Walk the tokens building name/value pairs and try to match those to the output.

When examining the tokens, there are two general issues:

  1. Recognizing whether a token is a new parameter or a continuation of the last token, and
  2. Getting a token to the right data type.

The library walks the tokens and makes one pass through them, matching to public fields found by Reflection. It's a simple two-deep state tree unrolled into a function rather than state objects. The tokens are put into a queue.

  1. If the queue is empty, we're done.
  2. Pop the first token off the queue.
    1. If it isn't a switchchar, add the token to the Undecorated Tokens collection and go back to the start.
    2. Split the token into a name, any plus/minus symbols, and any attached value. Do some basic validation.
    3. Find the field matching the name. If there isn't one, or if there is more than one, generate the appropriate exception.
    4. Switch on the type of the field:
      • Bool
        • If it's plus/minus, get the last character and set the field.
        • If it's an attached string, coerce to bool and set the field.
        • Examine the head of the queue:
          • Queue empty, implicit true.
          • Starts with a switchchar, implicit true.
          • Otherwise, pop the head of the queue and coerce to bool.
        • Assign.
      • Other
        • If we don't have an attached value, examine the head of the queue.
          • Queue empty, generate missing value exception.
          • Starts with switchchar, generate missing value exception.
          • Pop the head of the queue and use it as the value.
        • Examine the coercers collection for the field type. If we have one, call it.
        • If it's an enumeration, match the value string against the names in the enumeration.
        • Look for a static method Parse(string) on the type and use that.
        • Pass the string to the assignment and see if .NET can do it for us.
  3. Loop on back.

Idioms in the implementation

Nothing really special in the code. Here are a few random syntactical points I find interesting or entertaining:

  • The code as-is uses some LINQ syntax and lambdas, but nothing that couldn't be back-ported to older .NET versions if you need to. It would get harder to read, though.
  • In some places, you'll see potentially-interesting nullable constructs like:
  • C#
    func(SomeType st){SomeType s = st ?? new SomeType();}

    or the much more common:

    C#
    bool? nb;
    bool b = nb ?? false;
  • Another well-known--but interesting if you haven't seen it--construct is chained ternary operators:
  • C#
    BoolFormat bf = (Bools != string.Empty) ? BoolFormat.Explicit
                  : (Value != string.Empty) ? BoolFormat.AttachedString
                  :                           BoolFormat.Other;

    This tests the condition on the left of the question mark and returns the value on the right, proceeding line by line in order. Much easier to read than a string of if/else.

History

  • 2011/04/18: Version 1.0.
  • 2011/04/22: Article text updated.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionAck, regex-o Pin
Seth Morris30-Aug-12 16:26
Seth Morris30-Aug-12 16:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.