Click here to Skip to main content
Click here to Skip to main content

Writing a custom UriParser for .NET

, 17 Apr 2006
Rate this:
Please Sign up or sign in to vote.
This article explains how to extend .NET's UriParser to parse URIs that the framework doesn't currently support. This article does not deal specifically with any scheme, and the information can be applied to a myriad of URI formats.

Introduction

Currently, I am working on writing an open source SIP stack and TAPI interface based on RFC 3261. Although the .NET framework's network classes are surprisingly complete when it comes to familiar formats like HTTP or even Gopher, formats like SIP and MailTo are not supported, forcing me to implement them on my own. In good object oriented style, Microsoft allows you to modify existing formats, or even create your own, so adding support for a given scheme should be trivial, right? Well, Microsoft hasn't gotten around to documenting much of the UriParser classes. After hours of experimentation and some help from Jason Kemp, I have figured out exactly how to extend and register a UriParser, allowing your program to understand URIs from any scheme. This article will show you how to write your own UriParser in an attempt to fill the void of documentation.

Extending UriParser

UriParser is an abstract class that provides some methods for parsing a URI. Some callbacks are included also: whenever a Uri is created, all registered UriParsers are notified, for example. If the URI that you need to parse closely resembles a scheme that is already supported, it may benefit you to extend that UriParser. For most purposes, extending GenericUriParser is the best choice, because the constructor allows you to choose certain options regarding how things are parsed. Here is a skeleton class that explains the most important methods you may need to override:

public class SipStyleUriParser : GenericUriParser
{
    //You may want to have your constructor do more, but it usually
    //isn't necesssary. See the MSDN documentation for
    //GenericUriParserOptions for a full explanation of what it does.
    //Basically it lets you define escaping rules and the presence of 
    //certain URI fields.
    public SipStyleUriParser(GenericUriParserOptions options)
        : base(options) { }

    protected override void InitializeAndValidate(Uri uri,
        out UriFormatException parsingError)
    {
        //This function is called whenever a new Uri is created
        //whose scheme matches the one registered to this parser
        //(more on that later). If the Uri doesn't meet
        //certain specifications, set parsingError to an appropriate
        //UriFormatException.
    }

    protected override void OnRegister(string schemeName, 
        int defaultPort)
    {
        //This event is fired whenever your register a UriParser
        //(more on that later). The only use I can think of for this
        //is storing the default port when a UriParser matching the
        //correct scheme is registered.
    }

    public static new bool Equals(Object objA, Object objB)
    {
        //Use this method for test for equality between two Uri's.
        //It will not change Uri.Equals() unfortunately, So whenever
        //you need to test for equality, use this. RFC 3261 defines
        //special rules for equality of SIP URI's, for example, 
        //so a simple String.Equals() is not enough.
    }

    protected override bool IsWellFormedOriginalString(Uri uri)
    {
        //This method is similar to InitializeAndValidate. The
        //difference is that a valid URI is not necessarily
        //well-formed. You can use this to enforce certain
        //formatting rules if you wish.
    }

    protected override UriParser OnNewUri()
    {
        //This is fired when a new Uri is instantiated, and it
        //returns the new Uri in case you want to use it for
        //something. I still haven't found a use for this method.
    }

    protected override void OnRegister(string schemeName,
        int defaultPort)
    {
        //Whenever you register a parser with a scheme (I'll
        //cover this in the next section) this is fired. You
        //can check if the scheme is one that belongs to your
        //parser and store the defaultPort just in case a URI
        //doesn't specify it.
    }

    protected override string GetComponents(Uri uri,
        UriComponents components, UriFormat format)
    {
        //This method parses all the parts of a Uri. Uri exposes the
        //results of this method in a series of properties. You are
        //passed an enum telling you what parts to retrieve, and you
        //must parse them from the Uri given.
    }
}

Parsing a URI with GetComponents

The first thing that you might want to do is set up some Regex designed to parse out the different parts of your URI. If you use code snippets to set up a switch statement on the components parameter, you will be given a complete set of all the members of UriComponents.

protected override string GetComponents(Uri uri, 
          UriComponents components, UriFormat format)
{
     switch (components)
     {
          case UriComponents.UserInfo:
               //Parse out and return user info
          case UriComponents.Port:
               //Parse out and return port
          //etc...
     }
}

All you need to do is apply the correct Regex in each case and return the value. Microsoft leaves out a few possibilities though. The first two are UriComponents.Path | UriComponents.KeepDelimiter and UriComponents.Query | UriComponents.KeepDelimiter (you can get rid of the case for UriComponents.KeepDelimiter on its own, it's just an option switch and shouldn't return anything). They return the path or query, respectively, with the leading delimiter intact (surprise). In SIP, you don't have queries or paths, so I made the Path component return the SIP parameters and the Query component return the headers, because the syntax for SIP headers is identical to HTTP queries. Adjustments like this may need to be made for your URI scheme. If you have any doubts, instantiate a new URI with a Google query. Run your program in debug mode, and step through the code to see what components are required when you access each property in Uri. Knowing what flags make up each components case will help you use GetComponents calls to reuse some parsing code. It also gives you a good idea of what you should be returning in each case.

Registering your UriParser

I mentioned earlier that you need to register your UriParser before you can start instantiating Uris that require it. This associates the scheme string (i.e., "sip", "sips", "http") with a default port. Keep in mind that the scheme string must be present and greater than one character in length, and the port field must either be -1 or an integer exclusively between 0 and 65535. Here is some code to show you the right way to do it, and some ways that will fail:

//This registers "sip" to port 5060 using a SipStyleUriParser
UriParser.Register(new SipStyleUriParser(), "sip", 5060);

//This registers "pres" with no default port using PresStyleUriParser
UriParser.Register(new PresStyleUriParser(), "pres", -1);

//InvalidOperationException!
//The scheme http is already registered. This prevents the
//possibility of having a scheme registered with two conflicting
//parsers or default ports.
UriParser.Register(new CustomHttpStyleUriParser(), "http", 80);

//InvalidOperationException!
//Even though the schemes are different, you can't have the same
//instance parser parse more than one scheme. This makes sense if you
//are working in multithreaded environments 
SipStyleUriParser s = new SipStyleUriParser();
UriParser.Register(s, "sip", 5060);
UriParser.Register(s, "sips", 5061);

Examples

I have included the source for my SipStyleUriParser with this article. It is fully RFC 3261 compliant, and even follows the rules for URI comparison. I have also included an easy way to parse headers and parameters into a Dictionary so that they may easily be checked against each other regardless of order, and so that the values can be retrieved by the parameter name. It successfully completes all the test cases given by the specifications. You are welcome to use it in your own applications, and please let me know if you have any suggestions.

Conclusion

Despite MSDN's lack of documentation on the subject, writing your own UriParser is not very difficult. As long as you have a complete specification to work with, the implementation becomes fairly straightforward. Using this in combination with extensions of WebRequest and WebResponse will enable you to write a complete network stack! If you have any questions, comments, or suggestions, feel free to email me at augsod@gmail.com.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

August Sodora III
CEO Gibphone Enterprises, Fizzbo Online
United States United States
August is a programming enthusiast at Johns Hopkins. He started with QuickBASIC back in 1999 and flirted with many other languages before settling with the .NET framework. August spends most of his time programming for GibPhone

Comments and Discussions

 
General[Message Deleted] Pinmemberit.ragester28-Mar-09 5:33 
GeneralHelp PinmemberDR Delphi30-Aug-08 11:08 
GeneralHi, PinmemberLucianoNet16-Sep-07 16:30 
GeneralGood Article Pinmemberzakimirza13-Jan-07 22:15 
GeneralSipWebRequest.cs' could not be opened Pinmembernissim zur4-May-06 0:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.141022.2 | Last Updated 17 Apr 2006
Article Copyright 2006 by August Sodora III
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid