Click here to Skip to main content
15,861,168 members
Articles / Programming Languages / C#

Symbols as extensible enums

Rate me:
Please Sign up or sign in to vote.
4.43/5 (9 votes)
25 Feb 2014MIT8 min read 100.8K   305   29   42
Use the Symbol class for enum-like values that can be extended by other classes.

Introduction

In C#, sometimes you would like to define a base class, or a library, that uses an enumeration, and you would like to allow derived classes or users of your library to define new values for it. Trouble is, enums are not extensible: a derived class or user code cannot define new values.

For instance, one time, I wrote a library for serializing and deserializing "shapes" with geographic coordinates to a text file. In this library, several different shapes were supported: circles, rectangles, lines, polygons, and so on. Suppose there is a ShapeType enumeration for this:

C#
public enum ShapeType {
    Circle, 
    Rect,
    Line,
    Polygon
}

Enumerations are great for storing in a text file, since you can write t.ToString() to convert a ShapeType t to a string, and Enum.Parse(typeof(ShapeType), s) to convert the string s back to a ShapeType.

But suppose you would like to allow other developers to define their own shapes. Other developers cannot add new values to ShapeType, and even if they could, there is a risk that two developers would assign the same integer value to different kinds of shapes. How can we solve these problems?

Ruby to the Rescue 

Sometimes, when an extensible enum is needed, people use strings or integer constants (const int or readonly int) instead of enums. These solutions have at least the following problems: 

  • Strings and integers are not normally used for enumerations. Therefore, when other developers see a "string" or "int" property or parameter, they don't immediately realize that it is used for an enumeration. 
  • Some of the benefits of static typing is lost, since you can mistype a string or accidentally put a string/integer in a location that was intended to hold an enum value (or vice versa).
  • Strings can't be renamed with a refactoring tool (like the "Rename" feature of Visual Studio).
  • Strings are much slower than enumerations when comparing for equality, and they are slower than integers when used as dictionary keys (though due to an odd decision by Microsoft, enums also perform poorly as dictionary keys).
  • When using integers, it's hard to guarantee that two different developers each use unique values when extending the enumeration.

In the dynamic language Ruby, we commonly use symbols instead of enumerations. Symbols are like string literals, but instead of a string like "Circle", you use the symbol syntax :Circle.

For the most part, symbols solve the above problems. They can be compared as fast as integers, and they cannot be confused with strings. Since anyone can define a new symbol at any time, symbols are like an enumeration of unlimited size. And, if you use them as I prescribe below, it is possible to rename them with a refactoring tool. 

(Edit: later I found out that other languages also also have Symbols as a built-in concept, e.g. LISP)  

Symbols in .NET 

I have written a Symbol implementation for .NET that you can use as an extensible enum. I will now demonstrate how we can rewrite our ShapeType enum to use Symbols instead. First, change enum ShapeType into a class. Then, replace each enum value with a Symbol:

C#
public static class ShapeType
{
    public static readonly Symbol Circle  = GSymbol.Get("Circle");
    public static readonly Symbol Rect    = GSymbol.Get("Rect");
    public static readonly Symbol Line    = GSymbol.Get("Line");
    public static readonly Symbol Polygon = GSymbol.Get("Polygon");
}

If a third party wants to extend this list of symbols, they should write another static class with the additional possibilities. For example, Xyz corporation might write this extension:

C#
public static class FractalShape
{
    public static readonly Symbol Mandelbrot = 
                  GSymbol.Get("XyzCorp.Mandelbrot");
    public static readonly Symbol Julia = GSymbol.Get("XyzCorp.Julia");
    public static readonly Symbol Fern = GSymbol.Get("XyzCorp.Fern");
}

To ensure two independent parties don't accidentally define the same symbol for two different shapes (for example, if two different parties both made a shape called Fern), it is advisable to try to use a unique name when calling GSymbol.Get. That's because GSymbol.Get always returns the same Symbol when given the same input string. Therefore, in this example, I used the prefix "XyzCorp." to ensure that names defined by Xyz corporation are unique. 

Typesafe Symbols 

When I first wrote this article, people complained that Symbols are not type-safe: you could accidentally mix up two unrelated enumerations, since they both have type Symbol. And by votes of 3, my article was cast into the pit of zero readership. Besides, ShapeType as defined above is not a drop-in replacement for its enum equivalent, because ShapeType variable declarations have to be changed.

C#
ShapeType rect = ShapeType.Rect;

would have to be changed to this:

C#
Symbol rect = ShapeType.Rect; 

Now you can overcome these limitations using a type-safe "symbol pool". A SymbolPool is a "namespace" for symbols. There is one permanent, global pool (used by GSymbol.Get), and you can create an unlimited number of private pools. I will say more about how they work below, but for now, let me just show you how to make a type-safe extensible enum using SymbolPool<ShapeType>:

C#
public class ShapeType : Symbol
{
    private ShapeType(Symbol prototype) : base(prototype) { }
    public static new readonly SymbolPool<ShapeType> Pool 
                         = new SymbolPool<ShapeType>(p => new ShapeType(p));

    public static readonly ShapeType Circle  = Pool.Get("Circle");
    public static readonly ShapeType Rect    = Pool.Get("Rect");
    public static readonly ShapeType Line    = Pool.Get("Line");
    public static readonly ShapeType Polygon = Pool.Get("Polygon");
}

Since ShapeType's constructor is private, the only way to make a new ShapeType is by calling ShapeType.Pool.Get().

Now, a third party "XyzCorp" can define new ShapeTypes as follows:

C#
public class FractalShape : ShapeType
{
    public static readonly ShapeType Mandelbrot = 
                  Pool.Get("XyzCorp.Mandelbrot");
    public static readonly ShapeType Julia = Pool.Get("XyzCorp.Julia");
    public static readonly ShapeType Fern = Pool.Get("XyzCorp.Fern");
}

Note that the members of FractalShape still have the type ShapeType. It is not necessary to derive FractalShape from ShapeType; I only do so to make it clear that the two are related.

Using Symbols 

  • To convert a Symbol s to a string, call s.Name. You can also call s.ToString(), but this prefixes the name with a colon ( : ) as in Ruby (edit: I removed the colon in newer versions of Symbol, so Symbols act more like ordinary strings and enums.) 
  • Instead of Enum.Parse, when you want to convert a string back to a Symbol, you can call GSymbol.Get(string) to create a global symbol, or Pool.Get(string) where Pool is a private symbol pool. 
  • To get all symbols in a pool, just enumerate the pool: 
  • C#
    foreach (ShapeType s in ShapeType.Pool)
        ...

    The symbols are returned in the same order as they were created.

Note that every Symbol you create consumes a small amount of memory that cannot be garbage-collected if the Symbol's pool is stored in a global variable. Therefore, if the string comes from a large file, you may wish to call Pool.GetIfExists(string) instead (where Pool is either a private pool or GSymbol) to avoid a memory leak. GetIfExists does not create new symbols, it only returns symbols that already exist. Therefore, if you get a nonsense name like "fdjlas", GetIfExists returns null instead of a valid Symbol

There is a catch: if you use GetIfExists, you need to make sure that all desired symbols already exist. Therefore, before calling ShapeType.Pool.GetIfExists to decode a shape type name, you must make sure that derived types such as FractalShape are initialized. Accessing any ShapeType from FractalShape will do the trick:

C#
// Returns null if FractalShape has never been used
ShapeType s = ShapeType.Pool.GetIfExists("XyzCorp.Fern");

s = FractalShape.Julia;
s = ShapeType.Pool.GetIfExists("XyzCorp.Fern"); // guaranteed to work 

How Symbols Work, Briefly

This library has four classes.

  1. A Symbol is simply a small class with a read-only Name, integer Id, and a reference to its Pool. Every Symbol is cataloged in a SymbolPool.
  2. A SymbolPool contains a set of Symbols. SymbolPool contains a List<Symbol> and a Dictionary<string, Symbol> which are used to look up symbols by ID and by name, respectively. SymbolPool is thread-safe; you can safely create Symbols in the same pool from different threads.
  3. SymbolPool<T> is a derived class of SymbolPool that creates Ts, where T is a derived class of Symbol. You pass a factory function to its constructor, and when someone calls Get() to create a T, the SymbolPool calls your factory function, passing a "prototype" as a parameter (the "prototype" is a Symbol that you can use to construct a T).
  4. GSymbol contains the "global" SymbolPool. Call GSymbol.Get to create a "global" Symbol.

Each Symbol has an ID number; this is nothing more than the value of a counter that is incremented each time a symbol is created. IDs are unique within a given pool, but may be duplicated across pools. Private pools have positive IDs by default, starting at 1; the global pool has negative IDs starting at -1, except for GSymbol.Empty, which is the Symbol that represents the empty string (Name == "").

GetHashCode() is fast because it returns an ID number instead of obtaining the hash code of the string; therefore, Symbols are fast when used as keys in a Dictionary. Comparing Symbols for equality is fast, because only the references are compared, not the contents of each Symbol. Two Symbols are the same if and only if they are located at the same memory location.

Besides making type-safe extensible enumerations, another reason to use a SymbolPool is to construct a temporary set of Symbols that can be garbage-collected later. A SymbolPool and all the Symbols it contains can be garbage-collected when there are no references left to the pool itself or any of its Symbols. Note that a Symbol has a reference to its pool, so any lingering references to a Symbol will keep its entire pool alive. 

As for me... 

In the Loyc compiler tooling project, source code is represented with Loyc trees. In Loyc trees, I use Symbols rather than strings to represent all identifiers in source code (variable and method names) as well as names of built-in operators and constructs. This avoids storing multiple copies of strings and allows fast equality comparison. 

History 

  • June 1, 2008: First version.
  • December 12, 2009: Introduced SymbolPools.
  • December 14, 2009: Released on CodeProject. 
  • February 24, 2010: Added support for type-safe Symbols.  
  • February 25, 2014: Formatting error corrected. Edited some text based on newer information. 

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Software Developer None
Canada Canada
Since I started programming when I was 11, I wrote the SNES emulator "SNEqr", the FastNav mapping component, the Enhanced C# programming language (in progress), the parser generator LLLPG, and LES, a syntax to help you start building programming languages, DSLs or build systems.

My overall focus is on the Language of your choice (Loyc) initiative, which is about investigating ways to improve interoperability between programming languages and putting more power in the hands of developers. I'm also seeking employment.

Comments and Discussions

 
QuestionReference to Pool in Symbol Pin
mph6227-Feb-14 11:23
mph6227-Feb-14 11:23 
AnswerRe: Reference to Pool in Symbol Pin
Qwertie27-Feb-14 16:25
Qwertie27-Feb-14 16:25 
GeneralRe: Reference to Pool in Symbol Pin
mph6228-Feb-14 7:51
mph6228-Feb-14 7:51 
GeneralMy vote of 4 Pin
johannesnestler25-Feb-14 8:43
johannesnestler25-Feb-14 8:43 
GeneralRe: My vote of 4 Pin
Qwertie25-Feb-14 9:16
Qwertie25-Feb-14 9:16 
QuestionC# 4 Covariant return types Pin
theperm20-Oct-11 7:24
theperm20-Oct-11 7:24 
AnswerRe: C# 4 Covariant return types Pin
Qwertie20-Oct-11 15:13
Qwertie20-Oct-11 15:13 
GeneralSymbol Serialization Pin
jpbochi23-Dec-10 11:04
professionaljpbochi23-Dec-10 11:04 
GeneralRe: Symbol Serialization Pin
Qwertie23-Dec-10 15:56
Qwertie23-Dec-10 15:56 
Generalconstructor is inaccessible due to its protection level when trying to extend Pin
theperm19-Apr-10 10:52
theperm19-Apr-10 10:52 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
theperm19-Apr-10 11:09
theperm19-Apr-10 11:09 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
Qwertie20-Apr-10 16:34
Qwertie20-Apr-10 16:34 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
theperm21-Apr-10 1:10
theperm21-Apr-10 1:10 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
Qwertie21-Apr-10 15:52
Qwertie21-Apr-10 15:52 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
theperm21-Apr-10 23:18
theperm21-Apr-10 23:18 
GeneralRe: constructor is inaccessible due to its protection level when trying to extend Pin
puppyfox22-Jul-10 13:13
puppyfox22-Jul-10 13:13 
Generalnice Pin
johannesnestler25-Feb-10 1:33
johannesnestler25-Feb-10 1:33 
QuestionType Safety? Pin
Adam Robinson15-Dec-09 9:05
Adam Robinson15-Dec-09 9:05 
AnswerRe: Type Safety? Pin
supercat915-Dec-09 10:09
supercat915-Dec-09 10:09 
GeneralRe: Type Safety? Pin
Adam Robinson15-Dec-09 12:20
Adam Robinson15-Dec-09 12:20 
GeneralRe: Type Safety? Pin
supercat917-Dec-09 5:06
supercat917-Dec-09 5:06 
GeneralRe: Type Safety? Pin
Adam Robinson17-Dec-09 6:23
Adam Robinson17-Dec-09 6:23 
GeneralRe: Type Safety? Pin
supercat917-Dec-09 9:29
supercat917-Dec-09 9:29 
GeneralRe: Type Safety? Pin
Adam Robinson17-Dec-09 9:33
Adam Robinson17-Dec-09 9:33 
GeneralRe: Type Safety? Pin
Qwertie17-Dec-09 15:03
Qwertie17-Dec-09 15:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.