Immutable Data Structures in C# and F#

Tomas Petricek

5.00/5 (16 votes)

Feb 6, 2009

CPOL

18 min read

66629

303

The Real World Functional Programming book explains essential concepts of this paradigm using examples in C# 3.0 and F#. In this article we look at immutability, which stands behind the clarity of functional programs.

Title	Real World Functional Programming
Author	Tomas Petricek
Publisher	Manning
Published	May 2009 (expected)
Electronic version	via Manning Early Access program
ISBN-13	978-1933988924
Price	USD 49.99

This is a chapter excerpt from Real World Functional Programming authored by Tomas Petricek and published by Manning Publications. The content has been reformatted and edited to match the standard CodeProject article format.

Download source (F# and C# 3.0) - 2kb
Download the F# CTP addin for VS 2008 (from Microsoft Downloads)

In this article we'll take a look at one of the several concepts that together form the 'functional programming' paradigm. You probably noticed that the term functional programming has appeared in many areas recently - the C# 3.0 and LINQ have been largely influenced by this paradigm and many of the libraries that enable and simplify writing parallel code rely on functional ideas. Microsoft also recently announced that Visual Studio 2010 will include F# - a functional language - out of the box.

The functional paradigm is based on different principles than the imperative style. You may know some of the concepts already - for example 'anonymous functions' from C# 3.0 give us the ability to use function as an argument to a method. In functional programming, the ability to write functions that take other functions as an argument or return them as a result is one of the basic principles.

Immutable Data Structures

In this article we'll look at another concept that influences the way functional programs are written, which is 'immutability.' This means that objects used in the program cannot be modified once they are constructed. This has important practical consequences - the code written in this way can be more easily parallelized, because it doesn't suffer from race conditions. Immutability also makes the code more readable, because changes in the program state are more visible in the code, so we can see which operation changes the state and which doesn't. In this article, we'll look at the simplest functional data structure, a 'tuple' and we'll use it to demonstrate how we can work with immutable data types. We'll start by looking at examples in F#, but we'll also examine how to implement and use the same type in C# 3.0. We'll see more common functional data structures in the upcoming chapters.

I pointed out in the first chapter how we can write a function for processing data using immutable data types or objects. Instead of changing the internal state of the object (which isn't possible, because it is immutable) the processing function simply creates and returns a new object. The internal state of this new object will be initialized to a copy of the original object with a few differences in places where we wanted to change the state. This sounds a little abstract, but you'll see what I mean shortly in an example.

Introducing Tuple Type

The simplest immutable data structure in F# is a tuple type. Tuple is a simple type that groups together several values of (possibly) different types. The following example shows how to create a value (called tp), which contains two values grouped together:

> let tp = ("Hello world!", 42)
val tp : string * int

Creating a tuple value is fairly easy: we just write a comma separated list of values enclosed in parentheses. But let's look at the code in more detail - on the first line, we create a tuple and assign it to a tp value. The type inference mechanism of the F# language is used here, so you don't have to explicitly state what the type of the value is. The F# compiler infers that the first element in the tuple is of type string and the second is an integer, so the type of the constructed tuple should be something like "a tuple containing a string as the first value and an integer as the second value". Of course, we don't want to lose any information about the type and if we represented the result just using some type called for example Tuple, we wouldn't know that it contains string and integer.

The inferred type of the expression is printed on the second line. You can see that a type of a tuple is in F# written as string * int. In general, a tuple type is written as types of its members separated by an asterisk. In the next few sections, we'll see how tuples can be used in F#, but I'll also show you how you can implement exactly the same functionality in C#. If you don't immediately understand everything after reading the F# code, don't worry; just continue with the C# examples, which should make everything clearer.

So, how can we implement the same type in C#? The answer is that we can use C# 2. 0 generics and implement a generic Tuple type with two type arguments. The C# equivalent of the F# type string * int will then be Tuple<string, int>. We'll get to the C# version shortly after discussing one more F# example.

Working with Tuples in F#

Let's now look at some more complicated F# code that uses tuples. In the following listing, we use tuples to store information about a city. The first member is a string (the name of the city) and the second is an integer, containing a number of people living there. We implement a function printCity which outputs a message with the city name and its population, and finally we create and print information about two cities.

// Function that prints information about the city (1)
> let printCity cityInfo =
     printfn "Population of %s is %d."
             (fst cityInfo) (snd cityInfo);;

// Inferred type of the function (2)
val printCity : string * int -> unit 

// Create tuples representing Prague and Seattle (3)

> let prague  = ("Prague", 1188126)
  let seattle = ("Seattle", 594210);;

// Types of created tuples (4)
val prague : string * int
val seattle : string * int

// Print information about the cities (5)
> printCity prague
  printCity seattle;;

Population of Prague is 1188126.
Population of Seattle is 594210.

The listing shows a session from the F# interactive, so you can easily try it for yourself. The first piece of code (1) declares a function printCity, which takes information about the city as an argument and prints its value using the standard F# printfn function. The formatting string specifies that the first argument is a string and the second is an integer. To read first and second element of the tuple, we use two standard F# functions, fst and snd respectively (which are obviously acronyms for "first" and "second").

The next line (2) shows the type of the function deduced by the F# type inference. As we can see, the function takes a tuple as an argument (denoted using asterisk as string * int) and doesn't return any value (denoted as unit type on the right side of functional arrow symbol). This is exactly what we wanted.

Next, we create two tuple values (3) that store population information about Prague and Seattle. After these lines are entered, the F# interactive shell prints the types of the newly declared values (4) and we can see that the values are of the same tuple type that the printCity function takes as argument. That means we can pass both of these two values as an argument to our printing function and get the expected result (5).

The fact that types of the tuple match the parameter type of the function is important, because otherwise the two types would be incompatible and we wouldn't be able to call the function. To demonstrate this, you can try entering the following code in the F# interactive console:

let newyork = ("New York", 7180000.5)
printCity newyork

I'm not sure how New York could have 7180000 and half of inhabitants, but if this were the case then the type of the tuple newyork wouldn't be string * int anymore and would instead be string * float, as the type inference would correctly deduce that the second element of the tuple is a floating-point number. If you try it, you'll see that the second line isn't valid F# code and the compiler will report an error saying that the types are incompatible.

Working with Tuples in C#

I promised that we'd implement exactly the same code as the previous example in C# as well, so now it's the time to fulfill this promise and write some C#. As I already mentioned, we will represent tuples in C# using a generic type with two type arguments Tuple<TFirst, TSecond>, where TFirst and TSecond are generic type parameters.

The type will have a single constructor with two parameters of types TFirst and TSecond respectively, so that we can construct tuple values. It will also have two properties for accessing the values of its members, so unlike in F# where we accessed the elements using functions fst and snd, in C# we'll use properties First and Second. We skip the implementation for a minute, and instead look at how we can use the type. The code in the next listing has the same functionality as the previous, but it is written in C#.

// Method that takes a tuple as an argument (1)
void PrintCity(Tuple<string, int> cityInfo) {
   // Print information about city (2)
   Console.WriteLine("Population of {0} is {1}.",        
      cityInfo.First, cityInfo.Second);
}                                                         

// Creatae two sample tuples (3)
var prague  = new Tuple<string, int>("Prague", 1188000);
var seattle = new Tuple<string, int>("Seattle", 582000);

// Print information about cities (4)
PrintCity(prague);
PrintCity(seattle);

The PrintCity method takes a tuple of string and int as an argument; in C# we the types of method arguments have to be specified explicitly, so you can see that the type of cityInfo is Tuple<string, int> (1). The method prints the information using .NET Console.WriteLine method and uses properties of the tuple type (First and Second) to read its value (2).

Declares two variables (prague and seattle) and creates a tuple that stores information about the cities using a constructor with two arguments (3); The city information are later printed using the PrintCity method (4).

The translation from F# code to C# is very straightforward once we have an equivalent for the F# tuple type in C#. The code is slightly more verbose, mainly because we have to explicitly specify the type several times, whereas in the F# example, the type inference mechanism was able to infer the type everywhere. However, we’ll shortly see that this can be improved a little bit. We used a new C# 3.0 feature (var), which at least lets us use type inference when declaring the prague and seattle variables (3), because we're initializing the variables and C# can automatically infer the type from the right-hand side of the assignment.

Just like in the F# code, if we declared a tuple with an incompatible type (for example Tuple<string, double>) we wouldn't be able to use it as an argument to the PrintCity method. This is more obvious in C#, because we have to explicitly state what the type arguments for the generic parameters of the Tuple type are.

Implementing a Tuple Type in C#

The implementation of the tuple type in C# is quite straightforward. As already mentioned, we're using generics, so that one can create a tuple containing values of any two types:

public sealed class Tuple<TFirst, TSecond> {
   // Fields are explicitly marked as immutable (1)
   private readonly TFirst  first;
   private readonly TSecond second;

   public TFirst  First  { get { return first;  } }
   public TSecond Second { get { return second; } }
    
   public Tuple(TFirst first, TSecond second) {
      // Initialize fields (2)
      this.first = first;
      this.second = second;
   }
}

Probably the most notable thing is that the type is immutable. We've already seen how to create an immutable class in C# in the first chapter. In short, we mark all fields of the type using the readonly modifier (1) and provide only getter for both of the properties. Interestingly, this is somewhat opposite to F# where you have to explicitly mark values as mutable. Read-only fields can be set only from the code of the constructor (2), which means that once the object is created, its internal state cannot be mutated as long both of the values stored in the tuple are immutable as well.

Better Type-Inference for C# Tuples

Before moving forward, I'd like to show you one C# trick that makes our further examples that use tuples much concise. In the earlier examples, we had to create instances of our tuple type using a constructor call which required explicit specification of type arguments. We used the new C# 3.0 var keyword, so that the C# compiler inferred the type of variables for us, but we can do even better.

There is one more place where C# supports type inference and that is when calling a generic method. If you're calling a generic method and its type parameters are used as types of the method parameters then the compiler can use the compile-time types of the method arguments when the method is called to infer the type arguments. To clarify this, let's look at the code showing this.

public static class Tuple {
   public static Tuple<TFirst, TSecond> 
         Create<TFirst, TSecond>(TFirst first, TSecond second) {
      return new Tuple<TFirst, TSecond>(first, second);
   }
}

// Create tuples using 'Create' method (1)
var prague  = Tuple.Create("Prague", 1188000);
var seattle = Tuple.Create("Seattle", 582000);

The code shows an implementation of a static method Create, which has two generic parameters and creates a tuple with values of these types. We need to place this method in a non-generic class, because otherwise we would have to specify the generic parameters explicitly. Luckily, C# allows us to use the name Tuple, because types can be overloaded by the number of their type parameters (so Tuple and Tuple<TFirst, TSecond> are two distinct types).

The body of the method is very simple and its only purpose is to make it possible to create a tuple by calling a method instead of calling a constructor. This allows the C# compiler to use type inference as shown at (1). The full syntax for calling a generic method includes the type arguments, so using the full syntax we would have to write Tuple.Create<string, int>(...). As the types can be inferred automatically we can omit the type arguments. In the next section, we'll look at writing code that calculates with tuples and since we've just implemented the tuple type in C# we'll start with the C# version of the code and then move on to the F# alternative.

Calculating with Tuples

In the examples so far we have just created several tuples and printed the values, so let's perform some calculation now. For example we might want to increment the number of inhabitants by adding a number of newborns for the last year.

As already discussed, the tuple type is immutable, so we cannot set the properties of the C# tuple class. In F#, we can read the values using two functions (fst and snd), but there are no functions for setting the value, so the situation is similar. This means that our calculation will have to return a new tuple formed by the original name of the city copied from the initial tuple and the incremented size of population.

Let's first see how this can be done in C#. The following source snippet shows a new method that we'll add to the generic Tuple<TFirst, TSecond> class and several lines of C# code that show how to use this new functionality.

class Tuple<TFirst, TSecond> {
   // Returns tuple with the second value changed (1)
   public Tuple<TFirst, TSecond> WithSecond(TSecond nsnd) {
      return Tuple.Create(this.first, nsnd); 
   }
}

// Create city information about Prague
var prague0 = Tuple.Create("Prague", 1188000);
// Create information with incremented population
var prague1 = prague0.WithSecond(prague0.Second + 13195);
// Print the new intormation
PrintCity(prague1);

The WithSecond method (1) takes a new value of the second element as an argument and uses the Tuple.Create method to create a new tuple with the first element copied from the current tuple (this.first) and the second element set to the new value nsnd.

Now we'd like to do the same thing in F#. Here, we will write a function withSecond, which will do the same thing as a WithSecond method from our earlier C# example. It will take a tuple and a new value of the second element and return a new tuple with the first element copied from the original tuple and the second element set to a given value:

let withSecond tuple nsnd = 
   // Decompose a tuple into two values: 'f' and 's' (1)
   let (f, s) = tuple
   // Create a new tuple to return (2)
   (f, nsnd)

// Increment population and print the new information
let prague0 = ("Prague", 1188000)
let prague1 = withSecond prague0 ((snd prague0) + 13195)
printCity prague1

The code first shows an implementation of the function withSecond. We could implement it simply using the fst function, which reads a value of the first element in the tuple, but I wanted to demonstrate one more F# feature that can be used with tuples: pattern matching. You can see that inside the function, we first decompose the tuple given as the argument into two separate values (1) and we call these two values f and s (for first and second). This is where the pattern matching occurs; on the left-hand side of the equals sign you can see a language construct called a pattern and on the right-hand side we have an expression that is matched against the pattern. Pattern matching takes the value of an expression and decomposes it into a values used inside the pattern.

On the next line (2) we can use the value f extracted from the tuple using pattern matching. We reconstruct the tuple using the original value of the first element and the new value of the second element given as an argument (nsnd). We will look at more examples of pattern matching on tuples in the next section. Aside from using pattern matching, the code doesn't show anything new, but pattern matching is an important topic and F# provides other ways of using it with tuples, too. Let's take a closer look.

Pattern Matching with Tuples

In the last example we decomposed a tuple using pattern matching in a let binding. We can slightly improve the previous code sample. Since we didn't actually use the second element of the tuple, we only need to assign a name the first one. To do this, we can write an underscore for the second value in the pattern like this:

let (f, _) = tuple

The underscore is a special pattern that matches any expression and ignores the value assigned to it. Using pattern matching in let bindings is often very useful, but there are other places you can use it too. In fact, patterns can occur almost anywhere an expression is assigned to some value. For example, another place where pattern matching is extremely useful is when we're specifying the parameters of a function. Instead of parameter names, we can use patterns. This makes our setSecond function even simpler:

let withSecond (f, _) nsnd = (f, nsnd)

Now we've shortened our declaration from three lines to one. The result doesn't use any unnecessary values and clearly shows how the data flows in the code. Just from looking at the code, you can see that the first element of the original tuple is copied (by tracing the use of symbol f) and that the second function argument is used as a second element of the returned tuple (by following uses of nsnd). This is the preferred way of working with tuples in most of the F# functions that we'll write.

One other common use for pattern matching is in an F# match expression, which we saw earlier. We could rewrite our withSecond function to use a match expression like this:

let withSecond tuple nsnd =
   match tuple with
   | (f, _) -> (f, nsnd)

The match construct lets us match the specified expression (tuple) against one or more patterns starting with the bar symbol. In our example, we have only one pattern and because any tuple with two elements can be deconstructed into two values containing its elements, the execution will always follow this single branch. The F# compiler analyzes the pattern matching to deduce that the argument tuple is a tuple type containing two elements.

NOTE

Keep in mind that you cannot use pattern matching for example to determine whether a tuple has two or three elements. This would lead to a compile-time error, because the pattern has to have the same type as the expression that we're matching against the pattern and the type of a tuple with three elements (for example int * int * int) isn't compatible with a tuple that has two elements (for example int * int). Pattern matching can be used only for determining run-time properties of values; the number of elements in a tuple is specified by the type of the tuple, which is checked at compile time. If you're wondering how to represent some data type that can have several distinct values then you'll have to wait until chapter 5, where we'll look at unions.

In the previous example we used a pattern that cannot fail, because all tuples of two elements can be deconstructed into individual elements. This is called a complete pattern in F#. The match construct is particularly useful when working with patterns that are not complete and can fail, because we can specify several different patterns (every pattern on a new line, starting with the bar symbol) and if the first pattern fails, the next one is tried until a successful pattern is found.

What would be an incomplete pattern for tuples? Well, we could write a pattern that matches only when the first element (a city name) is some specific value. Let's say for example there are 100 people in New York that are never counted by any statistical study, so when setting the second element of a tuple (the population of the city) we want to add 100 when the city is New York. You could of course write this using an if expression, but the following example shows a more elegant solution using pattern matching:

> let setSecond tuple nsnd =
     match tuple with
     // Pattern that matches only New York (1)
     | ("New York", _) -> ("New York", nsnd + 100)
     // Pattern that matches all other values (2)
     | (f, _) -> (f, nsnd)
  ;;
val setSecond : string * 'a -> int -> string * int


> let prague = ("Prague", 123)
  setSecond prague 10;; 
// The expected result for Prague
val it : string * int = ("Prague", 10)

> let ny = ("New York", 123)
  setSecond ny 10;;
// Returned population is incremented by 100
val it : string * int = ("New York", 110)

You can see that in this example, the match expression contains two distinct patterns. The first pattern contains a tuple with a string "New York" as the first element and underscore as a second (1). This means that it only matches tuples with a first element set to "New York" and with any value for the second element. When this pattern is matched, we return a tuple representing New York, but with a population which is 100 more than the given argument. The second pattern (2) is the same as in previous examples and it just sets the second element of the tuple.

The examples following the function declaration shows the code behaving as expected. If we try to set a new population of Prague, the new value of population is used, but when we try to do this for New York, the new value of population is incremented by one hundred.

Tuples are used particularly frequently during the early phase of development, because they are so simple. In the next section, we'll look at another elementary immutable data type: a list. We've seen that a tuple represents a known number of elements with diverse types. Lists work the other way round: a list represents an unknown number of elements of the same type.

Summary

In this article we've looked at one of the types that are present in many functional languages - a tuple. We've seen how we can work with tuples in F# and then we implemented the same type (as a generic class Tuple<T1, T2> in C#. The type is immutable, so we also demonstrated how to write code that works with immutable data structures. Instead of mutating the value, we wrote a method that returns a new tuple instance. The type we implemented in C# can be practically quite useful, because you can use it if you need to wrap a return a pair of values and you don't want to declare a special class for this purpose (for example, because the pair is needed only in one location in the code in some internal implementation).