Click here to Skip to main content
Click here to Skip to main content

Symbols as extensible enums

By , 24 Feb 2010
 

Introduction

In C#, sometimes you would like to define a base class, or a library, that uses an enumeration, and you would like to allow derived classes or users of your library to define new values for it. Trouble is, enums are not extensible: a derived class or user code cannot define new values.

For instance, one time, I wrote a library for serializing and deserializing "shapes" with geographic coordinates to a text file. In this library, several different shapes were supported: circles, rectangles, lines, polygons, and so on. Suppose there is a ShapeType enumeration for this:

public enum ShapeType {
    Circle, 
    Rect,
    Line,
    Polygon
}

Enumerations are great for storing in a text file, since you can write t.ToString() to convert a ShapeType t to a string, and Enum.Parse(typeof(ShapeType), s) to convert the string s back to a ShapeType.

But suppose you would like to allow other developers to define their own shapes. Other developers cannot add new values to ShapeType, and even if they could, there is a risk that two developers would assign the same integer value to different kinds of shapes. How can we solve these problems?

Ruby to the Rescue

Sometimes, when an extensible enum is needed, people use strings or integer constants (const int or readonly int) instead of enums. These solutions have at least the following problems:

  • Strings and integers are not normally used for enumerations. Therefore, when other developers see a "string" or "int" property or parameter, they don't immediately realize that it is used for an enumeration.
  • Some of the benefits of static typing is lost, since you can mistype a string or accidentally put a string/integer in a location that was intended to hold an enum value (or vice versa).
  • Strings can't be renamed with a refactoring tool (like the "Rename" feature of Visual Studio).
  • Strings are much slower than enumerations when comparing for equality, and they are slower than integers when used as dictionary keys (though due to an odd decision by Microsoft, enums also perform poorly as dictionary keys).
  • When using integers, it's hard to guarantee that two different developers each use unique values when extending the enumeration.

In the dynamic language Ruby, we commonly use symbols instead of enumerations. Symbols are like string literals, but instead of a string like "Circle", you use the symbol syntax :Circle.

For the most part, symbols solve the above problems. They can be compared as fast as integers, and they cannot be confused with strings. Since anyone can define a new symbol at any time, symbols are like an enumeration of unlimited size. And, if you use them as I prescribe below, it is possible to rename them with a refactoring tool.

Symbols in .NET

I have written a Symbol implementation for .NET that you can use as an extensible enum. I will now demonstrate how we can rewrite our ShapeType enum to use Symbols instead. First, change enum ShapeType into a class. Then, replace each enum value with a Symbol:

public static class ShapeType
{
    public static readonly Symbol Circle  = GSymbol.Get("Circle");
    public static readonly Symbol Rect    = GSymbol.Get("Rect");
    public static readonly Symbol Line    = GSymbol.Get("Line");
    public static readonly Symbol Polygon = GSymbol.Get("Polygon");
}

If a third party wants to extend this list of symbols, they should write another static class with the additional possibilities. For example, Xyz corporation might write this extension:

public static class FractalShape
{
    public static readonly Symbol Mandelbrot = 
                  GSymbol.Get("XyzCorp.Mandelbrot");
    public static readonly Symbol Julia = GSymbol.Get("XyzCorp.Julia");
    public static readonly Symbol Fern = GSymbol.Get("XyzCorp.Fern");
}

To ensure two independent parties don't accidentally define the same symbol for two different shapes (for example, if two different parties both made a shape called Fern), it is advisable to try to use a unique name when calling GSymbol.Get. That's because GSymbol.Get always returns the same Symbol when given the same input string. Therefore, in this example, I used the prefix "XyzCorp." to ensure that names defined by Xyz corporation are unique.

Typesafe Symbols

When I first wrote this article, people complained that Symbols are not type-safe: you could accidentally mix up two unrelated enumerations, since they both have type Symbol. And by votes of 3, my article was cast into the pit of zero readership. Besides, ShapeType as defined above is not a drop-in replacement for its enum equivalent, because ShapeType variable declarations have to be changed.

ShapeType rect = ShapeType.Rect;

would have to be changed to this:

Symbol rect = ShapeType.Rect;

Now you can overcome these limitations using a type-safe "symbol pool". A SymbolPool is a "namespace" for symbols. There is one permanent, global pool (used by GSymbol.Get), and you can create an unlimited number of private pools. I will say more about how they work below, but for now, let me just show you how to make a type-safe extensible enum using SymbolPool<ShapeType>:

public class ShapeType : Symbol
{
    private ShapeType(Symbol prototype) : base(prototype) { }
    public static new readonly SymbolPool<ShapeType> Pool 
                         = new SymbolPool<ShapeType>(p => new ShapeType(p));

    public static readonly ShapeType Circle  = Pool.Get("Circle");
    public static readonly ShapeType Rect    = Pool.Get("Rect");
    public static readonly ShapeType Line    = Pool.Get("Line");
    public static readonly ShapeType Polygon = Pool.Get("Polygon");
}

Since ShapeType's constructor is private, the only way to make a new ShapeType is by calling ShapeType.Pool.Get().

Now, a third party "XyzCorp" can define new ShapeTypes as follows:

public class FractalShape : ShapeType
{
    public static readonly ShapeType Mandelbrot = 
                  Pool.Get("XyzCorp.Mandelbrot");
    public static readonly ShapeType Julia = Pool.Get("XyzCorp.Julia");
    public static readonly ShapeType Fern = Pool.Get("XyzCorp.Fern");
}

Note that the members of FractalShape still have the type ShapeType. It is not necessary to derive FractalShape from ShapeType; I only do so to make it clear that the two are related.

Using Symbols

  • To convert a Symbol s to a string, call s.Name. You can also call s.ToString(), but this prefixes the name with a colon (:) as in Ruby.
  • Instead of Enum.Parse, when you want to convert a string back to a Symbol, you can call GSymbol.Get(string) to create a global symbol, or Pool.Get(string) where Pool is a private symbol pool.
  • To get all symbols in a pool, just enumerate the pool:
  • foreach (ShapeType s in ShapeType.Pool)
        ...

    The symbols are returned in the same order as they were created.

Note that every Symbol you create consumes a small amount of memory that cannot be garbage-collected if the Symbol's pool is stored in a global variable. Therefore, if the string comes from a large file, you may wish to call Pool.GetIfExists(string) instead (where Pool is either a private pool or GSymbol) to avoid a memory leak. GetIfExists does not create new symbols, it only returns symbols that already exist. Therefore, if you get a nonsense name like "fdjlas", GetIfExists returns null instead of a valid Symbol.

There is a catch: if you use GetIfExists, you need to make sure that all desired symbols already exist. Therefore, before calling ShapeType.Pool.GetIfExists to decode a shape type name, you must make sure that derived types such as FractalShape are initialized. Accessing any ShapeType from FractalShape will do the trick:

// Returns null if FractalShape has never been used
ShapeType s = ShapeType.Pool.GetIfExists("XyzCorp.Fern");

s = FractalShape.Julia;
s = ShapeType.Pool.GetIfExists("XyzCorp.Fern"); // guaranteed to work

How Symbols Work, Briefly

This library has four classes.

  1. A Symbol is simply a small class with a read-only Name, integer Id, and a reference to its Pool. Every Symbol is cataloged in a SymbolPool.
  2. A SymbolPool contains a set of Symbols. SymbolPool contains a List<Symbol> and a Dictionary<string, Symbol> which are used to look up symbols by ID and by name, respectively. SymbolPool is thread-safe; you can safely create Symbols in the same pool from different threads.
  3. SymbolPool<T> is a derived class of SymbolPool that creates Ts, where T is a derived class of Symbol. You pass a factory function to its constructor, and when someone calls Get() to create a T, the SymbolPool calls your factory function, passing a "prototype" as a parameter (the "prototype" is a Symbol that you can use to construct a T).
  4. GSymbol contains the "global" SymbolPool. Call GSymbol.Get to create a "global" Symbol.

Each Symbol has an ID number; this is nothing more than the value of a counter that is incremented each time a symbol is created. IDs are unique within a given pool, but may be duplicated across pools. Private pools have positive IDs by default, starting at 1; the global pool has negative IDs starting at -1, except for GSymbol.Empty, which is the Symbol that represents the empty string (Name == "").

GetHashCode() is fast because it returns an ID number instead of obtaining the hash code of the string; therefore, Symbols are fast when used as keys in a Dictionary. Comparing Symbols for equality is fast, because only the references are compared, not the contents of each Symbol. Two Symbols are the same if and only if they are located at the same memory location.

Besides making type-safe extensible enumerations, another reason to use a SymbolPool is to construct a temporary set of Symbols that can be garbage-collected later. A SymbolPool and all the Symbols it contains can be garbage-collected when there are no references left to the pool itself or any of its Symbols. Note that a Symbol has a reference to its pool, so any lingering references to a Symbol will keep its entire pool alive.

As for me...

I will be using Symbols to represent node types in my extensible compiler project, Loyc. Loyc has just one AstNode class to represent AST nodes; to define a new type of AST node, one only needs to define a new Symbol and use it as AstNode.NodeType.

History

  • June 1, 2008: First version.
  • December 12, 2009: Introduced SymbolPools.
  • December 14, 2009: Released on CodeProject.
  • February 24, 2010: Added support for type-safe Symbols.

License

This article, along with any associated source code and files, is licensed under The MIT License

About the Author

Qwertie
Software Developer Trapeze Software, Inc.
Canada Canada
Member
I started programming when I was eleven. Now I'm old.
 
In my spare time I'm developing a system called Loyc (Language of your choice), which will include an enhanced C# compiler. Many programs have an add-in architecture; why not your programming language? I'm also looking for a life partner. Oh hi future wife! Wazzap.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionC# 4 Covariant return typesmembertheperm20 Oct '11 - 7:24 
AnswerRe: C# 4 Covariant return typesmemberQwertie20 Oct '11 - 15:13 
GeneralSymbol Serializationmemberjpbochi23 Dec '10 - 11:04 
GeneralRe: Symbol SerializationmemberQwertie23 Dec '10 - 15:56 
Generalconstructor is inaccessible due to its protection level when trying to extendmembertheperm19 Apr '10 - 10:52 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmembertheperm19 Apr '10 - 11:09 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmemberQwertie20 Apr '10 - 16:34 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmembertheperm21 Apr '10 - 1:10 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmemberQwertie21 Apr '10 - 15:52 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmembertheperm21 Apr '10 - 23:18 
GeneralRe: constructor is inaccessible due to its protection level when trying to extendmemberpuppyfox22 Jul '10 - 13:13 
GeneralKudosmemberTaset25 Feb '10 - 3:36 
Generalnicememberjohannesnestler25 Feb '10 - 1:33 
QuestionType Safety?memberAdam Robinson15 Dec '09 - 9:05 
AnswerRe: Type Safety?membersupercat915 Dec '09 - 10:09 
GeneralRe: Type Safety?memberAdam Robinson15 Dec '09 - 12:20 
GeneralRe: Type Safety?membersupercat917 Dec '09 - 5:06 
GeneralRe: Type Safety?memberAdam Robinson17 Dec '09 - 6:23 
GeneralRe: Type Safety?membersupercat917 Dec '09 - 9:29 
GeneralRe: Type Safety?memberAdam Robinson17 Dec '09 - 9:33 
GeneralRe: Type Safety?memberQwertie17 Dec '09 - 15:03 
GeneralRe: Type Safety?memberAdam Robinson17 Dec '09 - 17:17 
GeneralRe: Type Safety?memberQwertie16 Dec '09 - 15:28 
AnswerRe: Type Safety?memberQwertie16 Dec '09 - 14:10 
GeneralThoughtsmemberPIEBALDconsult15 Dec '09 - 4:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 24 Feb 2010
Article Copyright 2009 by Qwertie
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid