Concepts behind the C# 3.0 language

Tomas Petricek

4.84/5 (112 votes)

Oct 15, 2006

CPOL

33 min read

174841

In this article, I'll describe the concepts that influenced the design of C# 3.0. Most of these concepts are from other programming languages like Haskell, LISP, or languages developed at Microsoft Research.

Introduction

The C# 3.0 currently available in preliminary version is part of the LINQ project [1]. The aim of this project is to integrate better support for working data into mainstream general-purpose programming languages developed at Microsoft. It particularly targets the .NET platform, i.e. C# and VB.NET. Most of the ideas that are implemented in C# 3.0 are already available in mostly research and experimental languages that were previously developed at Microsoft Research. Among the most interesting and influential of these languages developed are Cω [2] and F# [3].

Contribution of F#

F# is a language based on ML, which is an impure functional language. In fact, it is based on a language called OCaml that adds several features to the standard ML. Functional languages are very different from imperative languages, with most of the widely used languages like C++, Java, and C# being imperative. The biggest contribution of F# is that it shows how functional languages can be compiled for the .NET Runtime (CLR) because the .NET CLR was initially designed for executing code written in imperative languages. Another aim of F# is the interoperability with other languages targeting the .NET platform. Part of the F# related research is also the ILX [6] project that shows how the .NET runtime could be extended to provide better support for functional languages, like first-class functions.

Functional programming, in general, was a big inspiration for some of the C# 3.0 features. The F# research language already showed how these features could be implemented for the .NET platform. C# 3.0 includes constructs that were inspired by:

Type inference: ability to deduce the type of expression
Tuples data types: types that represent a pair of values
First class functions: ability to take a function as a parameter and return it as a result
Lazy evaluation: ability to evaluate expressions only when it is needed later
Meta-programming: ability to manipulate with program source code

Most of these features that were added to C# 3.0 are very limited when compared with their implementation in F# and other functional languages. However, it is very interesting to see how functional concepts are becoming more and more important and can benefit the non-functional languages.

Contribution of Cω

Cω is a language based on C# and extends it in two important areas. The first area is better support for working with structured data (XML) and relational data (databases). The language extends the type system of C# to include support for several data types that are common in relational and structured data. It also provides querying capabilities for working with these data structures. The second area is support for concurrency constructs as a part of the language. In most of the widely used programming languages, support for concurrency is provided as a class library. By including these constructs in language, the program becomes more readable -- i.e. the intentions of the programmer can be expressed better -- and more work can be done automatically by the compiler.

C# 3.0 and LINQ project in general are inspired by the first area of extensions in Cω. The syntax extensions in C# 3.0 are very similar to the concepts developed in Cω. The biggest difference -- aside from the fact that Cω is an experimental language -- is that C# 3.0 provides better extensibility. This allows developers to provide a mechanism for working with different data sources. This extensibility is provided as a language feature and not as a compiler feature, as is the case with Cω. On the other hand, some extensions in Cω were simplified in C# 3.0. Anonymous types will be mentioned later, by way of example. So, where appropriate I will include examples where something possible in Cω can't be done in C# 3.0.

Overview of C# 3.0

The main goal of C# 3.0 is to simplify working with data and make it possible to access relational data sources (database) in much simpler ways. In C# 3.0, this is implemented in an extensible way, not by simply adding several special-purpose constructs. Thanks to the new language constructs, it is possible to write statements like the following:

var query = from c in db.Customers 
              where City == "London"
              orderby c.ContactName;
              select new { CustomerID, ContactName };

This statement looks like an SQL query. This is achieved thanks to query operators (FROM, WHERE, SELECT, ORDERBY and some others) that were added to the language. These operators are syntactic sugar that simplify writing of queries, but are mapped to underlying functions that perform projection, filtering or sorting. These functions are called Select, Where, OrderBy. For example, to perform filtering, the Where function needs to take another function as a parameter. Passing functions as a parameter to other functions is possible using new a language feature called lambda expressions. This is similar to the lambda functions known from many functional languages. You can also see that the query returns only CustomerID and ContactName from the more complex Customer structure. It is not required to explicitly declare a new class with only these two members because C# 3.0 allows developers to use anonymous types. It is also not required to declare the type of query variable because type inference automatically deduces the type when the var keyword is used.

Cω and integration of data access

The original idea of integrating data access into the general purpose programming language first appeared in the Cω research project at Microsoft Research. The data access possibilities integrated in Cω include working with databases and structured XML data. The LINQ project is mostly based on Cω; however, there are some differences. The features that can be found in both C# 3.0 and Cω include anonymous types (these are not limited to local scope in Cω), local type inference and query operators. One concept that wasn't available in Cω is extensibility through expression trees. In Cω, you couldn't write your own implementation of data source that would execute queries over anything other than in-memory data. The following demonstration shows how working with databases in Cω looks; it is very similar to the previous example written in C# 3.0:

query = select CustomerID, ContactName 
          from db.Customers 
          where City == "London"
          order by ContactName;

The Cω project will be mentioned later in other sections because some of the features that are available in C# and were originally implemented in Cω are more powerful in Cω. It is therefore interesting to see this generalization.

First class functions

"A programming language is said to support first-class functions if functions can be created during the execution of a program, stored in data structures, passed as arguments to other functions, and returned as the values of other functions."

(Source: Wikipedia.org)

Support for first-class functions is necessary for the functional style of programming. For example, functions that operate on lists in Haskell (map, foldl, etc.) are all higher order functions. This means that they take a function as a parameter. The previous quotation summarizes what first-class functions are and the advantages of languages supporting this feature. A more exact definition of what language has to support to treat functions as first-class objects can be found in [5]. According to this book, language has first-class functions if functions can be:

Declared within any scope
Passed as arguments to other functions
Returned as results of functions

Points 2 and 3 can be accomplished in many languages -- including C/C++ -- that allow passing a pointer to a function as an argument. In the first version of C#, it was possible to use delegates, which can be simply described as type-safe function pointers. However, neither C/C++ nor the first versions of C# allowed declaration of function in any scope, particularly inside the body of another function or method.

First class functions in F#

I will first mention how first-class functions are supported by F#, which is a mixed imperative and functional language for the .NET platform. I chose the F# language because it shows that implementing functional language features under the .NET platform is possible. I will first show some features that are unique to F#, when compared with other .NET languages:

// Declaration of binding 'add' that is initialized
// to function using lambda expression
let add = (fun x y -> x + y);;

// Standard function declaration (shorter version, 
// but the result is same as in previous example)
let add x y = x + y;;

// Currying – creates function that adds 10 to parameter
let add10 = add 10;;

The first thing this example shows is that in F#, functions are type like any other. This means that unlike in most of the languages where functions are something other than global variables, functions in F# are just ordinary global variables. Or, to be more precise, they are bindings because by default all data in F# is immutable. This is demonstrated by the first example of an add function, which is a global variable whose value is a function created using the lambda expression. The second example shows simplified syntax for declaring global functions, but the only difference when compared with the first version of the add function is the syntax.

The next example shows currying, which is an operation that takes a function and binds its first parameter with a specified value. In this case, the parameters are the function add (with type int -> int -> int) and constant 10. The result is a function (assigned to add10) of type int -> int that adds 10 to the specified parameter. The currying operation can be used in any language that has first-class functions, but the syntax in F# makes it extremely easy to use. The following examples show some common usages of first-class functions, which I will later write in C# as well:

// passing function to higher order functions 
let words = ["Hello"; "world"] in
iter (fun str -> Console.WriteLine(str); ) words;

// Returning nested function
let compose f g = 
  (fun x -> f (g x));;

The first example initially declares a list containing strings. Then it uses a function iter that calls the function passed as the first parameter for every element in the list. Note that F# is not a purely functional language, which means that a passed function can have side effects such as printing strings on the console. The second example is function that returns the composition of two functions passed as arguments. This is a good example of returning a function declared in local scope. The second interesting point in this example is the types of the functions. No types are declared explicitly, so the type inference algorithm infers that the type of the first parameter is 'b -> 'c, the type of second parameter is 'a -> 'b and the return value has type 'a -> 'c, where 'a, 'b and 'c are type parameters. This means that a compose function can be used for any two functions where the return type of the second function is the same as the parameter type of the first function.

First class functions in C# 2.0 and C# 3.0

Since the first version of .NET Framework and C#, a mechanism called delegates has been available. Delegates are type-safe function pointers that can be passed as a parameter or return a value from a function. As mentioned earlier, the first version of C# didn't support declaration of functions inside function bodies (nested functions), so it was only possible to initialize a delegate to a globally declared method.

C# 2.0 anonymous delegates

The following example shows how anonymous delegates in C# 2.0 can be used for creating a method that returns a "counter" delegate. The counter delegate is a function that adds numbers to a specified init value and returns the total:

CounterDelegate CreateCounter(int init) {
  int x = init;
  return delegate(int n) {
      x += n;
      return x;
    }
}

CounterDelegate ct = CreateCounter(10);
ct(2); // Returns 12
ct(5); // Returns 17

This example shows one interesting problem that appears with the ability to declare functions inside body and the ability to return this function as a return value. You can see that the anonymous delegate uses a local variable called x for storing the total value. So, there must be some mechanism for creating references to local variables inside a method body. Another problem in this example is that when the function CreateCounter returns, the activation record for the function (with the local variable x) will be destroyed. These issues are solved in functional languages using closures and a similar mechanism was added to C# 2.0. Closure captures the value of a local variable and preserves it when the activation record is destroyed.

While anonymous delegates provide everything that is needed for first-class functions, the syntax is quite verbose and inconvenient for uses known from functional languages. C# 3.0 comes with a new language feature called lambda expressions, which provides more functional syntax. The .NET base class library is also extended to include some of more important functions known from functional languages, such as like map, filter, etc. Use of these functions is illustrated in the following example:

IEnumerable<int> nums = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
IEnumerable<int> odd = nums.Where( (n) => (n%2) == 1 );
IEnumerable<int> oddSquares = odd.Select( (n) => n*n );

The first line of the example declares the variable nums, which is n array containing numbers from 1 to 10. In the second line, the numbers are filtered and the returned list odd contains only odd numbers. The Where function takes delegate as a parameter and the delegate is created using lambda expression syntax. In this syntax, (n) specifies the parameters of the delegate. The expression on the right side of the "=>" token is the body of delegate. The Select function is later used to calculate squares of all odd numbers in the list. One more interesting point in this example is that you don't have to explicitly declare the type of the lambda expression parameter when the type can be deduced from the context. This will be discussed in following section. Lambda expressions can be also used for accessing the data representation of expressions, but this will be described in more detail in the section about meta-programming.

Type inference

Type inference is the feature of strongly typed programming languages that refers to the ability to deduce the data type of an expression. This feature is currently available in many -- mostly functional -- programming languages, including Haskell and ML. It is also available in F#. In languages that support type inference, most of the identifiers -- such as variables and functions -- can be declared without explicit type declaration. However, the language is still type-safe because the type is inferred automatically during the compilation.

Type inference in F#

The F# type system, which is based on the ML type system, depends heavily on type inference. In fact, no declaration in F# expects the name of the type unless some conflict or ambiguity occurs. In this case, the programmer can use type annotations to provide hints to the compiler. Let's start with a simple example that demonstrates how F# type inference works:

// Some value bindings
let num = 42;;
let msg = "Hello world!";;

// Function that adds 42 to the parameter
let addNum x = x + num;;

// Identity function
let id x = x;;
let idInt (x:int) = x;;

In this example, no type is explicitly declared, but the type of all identifiers is known at compile time. The type of value bindings (num and msg) can be inferred because the type of expression on the right side of binding is known. i.e. 42 is integer and "Hello world!" is string. In the last example, the type of num is known (it is integer) and the "+" operator takes two integer parameters. Therefore the type of the x parameter must be integer too. Because "+" applied to two integral parameters returns integer, the return type of the function is also integer.

The last two examples show how to declare the identity function in F#, the function that returns a passed parameter. The type inference algorithm figures out that the type of the returned value is exactly the same as the type of the parameter, but it doesn't know what type it actually is. In this case, the F# type system can use type parameters. So, the inferred type is 'a -> 'a where 'a is the type parameter. If I want to define the identity function only for integral values, I need to use type annotations to provide hints to the compiler. This is shown in the last example, where it is explicitly declared that the type of parameter x will be integer. Using this information, the type inference algorithm can deduce that the return value of the function will be integer, too.

Type inference in C# 3.0

The support for type inference in C# 3.0 is very limited when compared with type inference in F#, but it is very interesting because it makes C# the first mainstream language that supports it. For no obvious reason, type inference was implemented only in functional languages so far. In C# 3.0, type inference can be used only with lambda expressions or to infer the type of the local variable that is initialized to some value with the initialization expression. It is not possible to use type inference for either method return value or any class members. First, let's look at the type inference that can be used with lambda expressions:

IEnumerable<int> nums = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
IEnumerable<int> odd = nums.Where( (n) => (n%2) == 1 );

In this example, the lambda expression (n) => (n%2) == 1 is used, but the type of parameter n is not explicitly declared. In this case, the compiler knows that the type of parameter expected by the Where function is delegate, which takes int as a parameter and returns bool. Based on this information, the compiler deduces that the type of n must be int. It also checks whether the return type of the lambda expression matches with the expected return value of the delegate. The second case where type inference is used in the C# 3.0 is in the ability to deduce the type of the local variable using the var keyword:

// Some simple examples
var str = "Hello world!";
var num = 42;

// Declaration without using type inference
TypeWithLongName<AndWithTypeParameter> v = 
  new TypeWithLongName<AndWithTypeParameter>();

// Same declaration with type inference
var v = new TypeWithLongName<AndWithTypeParameter>();

The first two examples show how type inference can be used to infer the type of primitive variables, where the first one is string and the second is integer. The next example shows that type inference can significantly simplify the source code because it is not necessary to repeat the same type twice in variable declaration. There is, however, still intensive discussion about which situations the var keyword should be used and when not. This is because, when used improperly, the code becomes less readable. There is one very important reason for including type inference in C# and this reason is anonymous types. Anonymous types will be discussed in more detail later, so I will show only a simple example now:

var anon = new { Name = "Hello" };
Console.WriteLine(anon.Name);

In this example, the anon variable is initialized to an instance of anonymous type, which has one member called Name with value "Hello." The compiler knows exactly the structure of the type, but there is no name that could be used for declaring a variable of this type. In this situation, the type inference and var keyword become extremely useful.

Lazy evaluation

Lazy evaluation is one of the basic concepts in the Haskell programming language and, thanks to lazy evaluation, it is possible to declare infinite lists using list comprehensions. I will discuss two different uses for lazy evaluation in the C# language. First, I will show how to emulate lazy evaluation for parameter passing in C#, which is possible thanks to better support of first-class functions. In the second section, I will show that writing infinite (lazy) lists in C# is almost as easy as in Haskell.

Lazy parameter passing

When calling a function using the lazy evaluation strategy, the value of the parameter is evaluated only when needed. This is the default behavior of Haskell; it can also be simulated in Scheme using delay/force. In C#, similar behavior can be achieved using high-order functions: either anonymous delegates or, better, by using lambda expressions. A typical example of lazy evaluation is a function with two parameters that uses the second parameter only when the first matches some criterion:

// Function that may not need second parameter
int func(int first, int second)   
{
    if ((first%2) == 0) return 0; else return second;
}

// Call to the 'func'
func(n, expensiveCalculation());

In this example, lazy evaluation may be very useful because it can prevent expensiveCalculation() from being called when it is not needed. In the following code, I will use the Func<T> type, where T is the type parameter. This type represents delegate, the .NET type used for passing functions, with no parameters and with return a value of type T.

// Now, the parameters are functions that 
// can be used for calculating values
int func(Func<int> first, Func<int> second)   
{
    if ((first()%2) == 0) return 0; else return second();
}

// Parameters are passed using lambda expressions
func(() => n, () => expensiveCalculation());

This function now behaves similarly to its equivalent written in Haskell because when the second parameter is not needed, the expensiveCalculation function is never invoked. There is only one important difference. That is, when value is needed twice, the function would also be evaluated twice. However, this can be solved simply by passing the wrapper class that contains delegate for calculating the value and stores the result of the call locally when it is evaluated.

Infinite lists in C# 3.0

The closest equivalent to the infinite lists that are possible in Haskell thanks to lazy evaluation is the concept of iterators, known from object-oriented programming. Iterators are used for traversing through elements of collection, usually using a method that moves to the next element in collection and the method that returns the current element. However, writing iterators for creating number sequences is much more complicated than using list comprehensions in Haskell.

As already mentioned, the Cω language introduced a new data type called stream. Streams in Cω are homogenous collections of a particular type, like arrays. Unlike arrays, however, streams are lazy. The syntax for stream declaration is an asterisk (*) appended to the name of the element type. This language also introduced interesting syntax for generating streams that makes it possible to generate streams more easily than using iterators :

// COmega – stream with fibonacci numbers
public static int* Fibs() { 
  int tmp, v1=0, v2=1;
  while (true) {
    yield return v1;
    tmp = v1; v1 = v2; v2 = tmp + v2;
  }
}

The key concept here is the yield return statement that returns next element of the stream. The stream is lazy, so when reading elements from the stream, only the required number of iterations is done and the execution is stopped when no more elements are needed. The Cω language also provides in-line syntax for the declaration of streams:

int* nums = { for(int i=0; ; i++) yield return i; };

When working with lists in Haskell, the most useful functions are map, which provides projections, and filter, which provides filtering. These functions are available in Cω in two syntactically different ways. The first way makes it possible to use XPath-like operators, while the second way enables SQL-like querying. The functionality is, of course, the same. In the following example, I will show how to get the squares of all even numbers from a previously defined sequence of numbers using both the XPath-like and the SQL-like ways:

// XPath-like filter and projection operators
int* squaresOfEven = nums[ (it%2)==0 ].{ it*it };

// SQL-like projection (select) and filter (where) operators
int* squaresOfEven = select it*it from nums where (it%2)==0;

This yield return statement was considered very successful, so it was introduced in the next version of the mainstream C# language (version 2.0). The main difference in C# is that the returned data type is not a stream (int*), but an iterator object (IEnumerable<int>). This introduces a simple method for writing infinite lists in C#. However, other features for working with streams, like the apply-to-all statement, are not available in C# 2.

The main reason why C# 2.0 didn't contain functions known from functional programming -- i.e. projection and filtering -- was that C# 2.0 contained poor support for passing functions as arguments. This is improved in C# 3.0 so, thanks to lambda expressions, it is possible to declare methods that provide filtering and projection operations for sequences (IEnumerable<T>). The following example shows how to work with sequences in C# 3.0 in a way that is similar to the Cω examples:

// Function GetNums returns all numbers using yield return
IEnumerable<int> nums = GetNums();

// Manipulation using Where and Select
IEnumerable<int> squaresOfEven = 
  nums.Where((it) => (it%2)==0 ).Select((it) => it*it );

// Same operation written using query operators
IEnumerable<int> squaresOfEven = 
  from it in nums where (it%2)==0 select it*it;

In the first example, manipulation is done using the Where and Select methods that take a lambda expression to use for filtering and projection. Exactly the same result can be achieved using special operators -- such as select, from, where and some others -- that are available in C# 3.0. I still didn't show how to declare infinite lists in C# 3.0 in a way similar to the list comprehensions known from Haskell. The declaration of list in Haskell can consist of four parts:

Initialization section: declares initial values that can be later used for calculating the rest of the list
Generator section: declares a list from which values in the newly created list can be calculated; this list can use the newly created list recursively
Filtering section: manipulates elements from the generator list
Projection section: manipulates elements from the generator list

Let's see how this can be written in Haskell and in C# 3.0. The following example defines an infinite list of Fibonacci numbers:

-- Haskell
fibs = 0 : 1 : [a + b | (a, b) <- zip fibs (tail fibs)]
// C# 3
var fibs = new InfiniteList<int>((t) => 
  t.ZipWith(t.Tail()).Select((v) => v.First + v.Second),
  0, 1);

The semantics of both examples are the same. The list of Fibonacci numbers is declared recursively using already known elements in the list. The generator section in C#, t.ZipWith(t.Tail()), is equivalent to zip fibs (tail fibs) written in Haskell. In the C# version, it is not possible to reference the fibs variable while it is being initialized. So, the t is the instance of the list passed to the generator to allow recursive declarations. The projection secion in C#, Select((v) => v.First + v.Second), is equivalent to a + b | (a, b) from Haskell. The difference here is that C# doesn't directly support tuples, so tuple is represented as a structure with members First and Second. The filtering section is not used in this example. Finally, the last two parameters (0, 1) define initial values in the list, which is equivalent to the declaration of the first two values in Haskell (0 : 1 : …).

Lazy evaluation summary

The lazy evaluation strategy can be simulated in many non-functional languages because it is a general concept. However, in most of these languages this simulation would be difficult and inconvenient to use. Thanks to new features available in the latest version of C# 3.0, some of concepts known from functional programming -- including lazy evaluation and list comprehensions -- can be naturally used in C#.

Anonymous types

Anonymous types are another example of a useful feature known from functional programming that appears in C# 3.0. Anonymous types are based on tuples known from Haskell, or any other functional programming language -- including F# -- developed at Microsoft Research. The first object oriented language that implemented the concept of tuples was, as already mentioned, Cω. Tuples -- also called anonymous structs -- in Cω can contain either named or anonymous fields. The type of tuple containing the anonymous string and the integer field called N is struct{string; int N;}. Anonymous fields of the tuple can be accessed using index expression. For example, to get the string field from the previous tuple, you can use t[0]. The named fields can be accessed either by using the same technique as anonymous fields or using the name. In Cω, tuple is of regular type. So, it can be used in any variable or method declaration. The following example shows how the tuples in Cω can be used:

// Method that returns tuple type
public struct{string;int Year;} GetPerson() {
    return new { "Tomas", Year=1985 };
}

// Call to the GetPerson method
struct{string;int Year;} p = GetPerson();
Console.WriteLine("Name={0}, Year={1}", p[0], p.Year);

One of the situations where tuples are extremely useful is when using projection operation. This would be analogous to the map function in Haskell, the projection or select operator in Cω and Select method in C# 3.0. The following example first creates a stream with numbers and then uses the projection operator to create a stream of tuples where every element contains original number and its square.

// Get stream with numbers 
int* nums = GetNumbersStream();
// Projection returns stream of tuples with number and its square
struct{int;int;} tuples = nums.{ new { it, it*it } };

The projection operation was also probably the main reason for including anonymous types in C# 3.0. This is because C# 3.0 and the LINQ project focus on simplifying data access and while writing database queries. So, projection is very important. The following example shows usage of anonymous types in C# 3.0:

// Creating anonymous type
var anon = new { Name="Tomas", Year=1985 };

// Creating anonymous type in projection operator
from db.People select new { Name=p.FirstName+p.FamilyName, p.Year };

It is obvious that anonymous types in C# 3.0 are based on ideas presented in Cω. However, there are several differences. The first notable difference is that in C# all fields of anonymous type are named. In the previous example, the name of the first field is specified explicitly (Name) and name of second field is inferred automatically (the name Year will be used). The second and probably most important difference is that in C# 3.0 it is possible to create an instance of anonymous type, but the only way to declare a variable of this type is to use the var keyword. This keyword infers the type of the variable from expression, so the type information isn't lost. This means that there is no way to reference the anonymous type in C# 3.0, making it impossible to use anonymous types as parameters of methods. Anonymous types can therefore be used only in limited scope.

This limitation of scope in C# 3.0 is intentional because misusing anonymous types in object oriented programming could lead to less readable and less maintainable code. However, there are some situations where passing anonymous types as return a value from methods would be very useful. The Cω language shows that this limitation can be resolved and that tuples known from functional programming can be used without limitations in object oriented languages.

Meta-programming

Language oriented development

"Language oriented programming is a style of programming in which, the programmer creates domain-specific languages for the problem first and solves the problem in this language."

(Source: Wikipedia.org)

Mainstream languages like C++, C#, Java or functional Haskell are languages that belong to the category called general purpose languages (GPL). This category contains languages that are not intended for some specific use, but can be used for developing a wide range of applications. Opposite to these languages are the so-called domain-specific languages (DSL) that can be used only for one specific purpose. These languages are usually designed and optimized for their purpose, so they serve better than GPLs. These languages are in some sense similar to the class libraries of the object oriented world because class libraries are also designed for serving one specific purpose. A good example of a DSL is the SQL language, whose purpose is database querying. This language demonstrates all characteristics of DSL; it has a very limited domain of use, but in this domain it serves better than any general purpose language.

In his article [4], Martin Fowler divides DSLs into external and internal classifications. Internal (also called embedded) DSLs are languages that extend and modify the host (general purpose) language in which they are used. An example of a language that allows developers to modify it -- and this is necessary for bigger projects -- is LISP. It is itself very simple, but it also provides ways for extending it. So, in more complex projects, the language is first extended using macros and the problem can be than solved easily using the created LISP dialect. In contrast, external DSLs are languages that are independent of the language in which they are used. An example is SQL.

Meta-programming

"Meta-programming is the writing of programs that write or manipulate other programs (or themselves) as their data."

(Source: Wikipedia.org)

As described in this quotation, the principle of meta-programming is that the code written by the developer works with some representation of program code, either in the same or in a different language. This code can be analyzed, interpreted or translated to some other language. It is obvious that meta-programming is used for creating domain-specific languages. However, creating DSLs is not the only possible use of meta-programming. In the case of external DSL, the developer has to write a compiler or translator in another, usually general purpose, language. This option isn't used very often because writing a compiler is not a trivial task. The case of internal DSL is much more interesting because the program manipulates with its own code at run-time. To enable this, the language must provide some way to access the data representation of its own code, as well as some extensibility that would allows users to define their own sub-languages.

From what I've already mentioned, you can see that in language which supports advanced forms of meta-programming, it is possible to develop a domain-specific language similar to SQL. Using meta-programming, it would be possible later to get a representation of the code written in this SQL-like sub-language, translate it to SQL command text and execute it! In the rest of this article, I will use the term meta-programming to mean manipulation with a program's own code using the features provided by the programming language. This is because manipulation with external code can be written almost in any GPL language; only the support for reading and manipulation with text files is needed for this.

Meta-programming in C# 3.0

Support for meta-programming is one of the key features of the LINQ project and especially its implementation for working with databases called "LINQ to SQL" (previously DLINQ). When writing LINQ code that will be used for accessing to database, the expressions used for filtering and projection -- as well as for other operators like joins -- must be translated to SQL command. This couldn't be done without the ability to get the data representation of those expressions and therefore without some form of meta-programming.

In C# 3.0, it is possible to get the data representation (called an expression tree) only from a lambda expression whose body is the expression. I see the biggest limitation of C# 3.0 here because it is not possible to get a data representation of either statement or statement block. The lambda expression can be compiled in one of two ways. It can be either compiled to delegate -- i.e. the object that can be used to execute the function -- or compiled to the code that returns the expression tree (the data representation) of the lambda expression. The decision between these two options is based on the l-value of the code in which the lambda expression appears. When it is assigned to the variable of type Func, it is compiled as delegate. When the type of variable is Expression, it is compiled as data. The following example shows how the same lambda expression can be compiled to delegate as well as data representation. For demonstration purposes, we'll use a function that takes integer as a parameter, performs some calculations with it and returns true when the value is less than ten.

// Lambda expression as executable code
Func<int, bool> = 
  x => DoSomeMath(x) < 10;

// Lambda expression as data representation
Expression<Func<int, bool>> = 
  x => DoSomeMath(x) < 10;

This is exactly the principle that is used in the LINQ project in cases when an application works with a database and so the query is translated to the SQL language. The following example demonstrates how two lambda expressions -- one for filtering and the second for projection -- can be used when accessing a database:

// Database query written using Where 
// and Select extension methods
var q = 
  db.Customers.
    Where(c => c.City == "London").
    Select(c => c.CompanyName);

From this example, you can see that the expressions used for the filtering and projection of data from the Customers table are passed to methods Where and Select using lambda expressions. It depends on the concrete LINQ implementation whether the type of the parameter passed to these methods will be Func or Expression. So, it is possible to write implementations that accept delegate types and execute them, for example, for filtering data that are stored in memory. It is also possible to write implementations that accept data representation of the code and translate it into a different language -- i.e. from LINQ to SQL implementation -- or use some other way to execute the code. This typically works well. However, to make the data access even more intuitive, C# 3.0 provides operators (from, select, where and some others) for this purpose. These operators are only syntactic sugar and code written using these operators is translated to the code shown in the previous example. Using the LINQ operators, you can write following:

// Database query that uses "syntactic sugar"
var q = 
  from c in db.Customers
    where c.City == "London"
    select c.CompanyName;

If you think of these examples in terms of DSL and meta-programming, you can see that domain-specific language used to represent database queries in C# 3.0 is composed of expressions that can be compiled to expression trees, as well as from query operators that are built into the language or, to be more precise, from the query methods such as Select, Where, etc. The target language of translation -- SQL in the case of LINQ to SQL -- doesn't usually support all of the expressions that can be written in C#. So, the translator must check whether all constructs written by the programmer can be translated. In general, any DSL language that can be created in C# 3.0 can use only expressions or a limited subset of C# expressions, and some way for composing these expressions together. Examples would be using a sequence of method invocations or using overloaded operators.

Meta-programming in F#

In the F# language, the support for meta-programming goes even further by making it possible to access almost any code written in F# by using special operators. This includes expressions, as in C# 3.0, but also statements and more complex structures. The following example illustrates how to get the data representation of an infinite loop writing the text "Hello." The type of returned value is expr, which represents F# code:

let e = <@ 
      while true do 
        print_string "hello" 
      done @>;;

As I already said, it is possible to write lambda expressions in F# because lambda expressions are one of the basic features in any functional language. It is also possible to get expr representation of those lambda expressions, so F# quotations can be used in the same way as lambda expressions in C# 3.0:

let e = <@ 
        fun x -> doSomeMath(x) > 10
      @>;;

A recently released version of F# contains an example that shows how to extend this language with support for data querying constructions similar to the operators in the LINQ project. This shows the flexibility of the F# language because it was not necessary to extend the language itself in any special way to allow querying of the database. The key feature, meta-programming, was already added and it is possible to define custom operators, so there is no need for syntactic sugar such as query operators in C# 3.0. The implementation of data querying in F# internally transforms the data representation of an expression written in F# to the C# 3.0 expression tree. This expression tree is passed to the "LINQ to SQL" converter. In the following example, you can see how to write a database query similar to the previously shown C# 3.0 example:

let query =
    db.active.Customers
    |> where <@ fun c -> c.City = "London" @>
    |> select <@ fun c -> c.ContactName @>;;

There is one more very interesting feature in F# that I haven't mentioned yet. I will demonstrate this in the example with the lambda expression that calls the doSomeMath function. F# allows you to expand all top-level definitions in the code representation with its actual value. This means that the call to the doSomeMath function can be replaced with its actual implementation. This can be used, for example, to allow the developer to employ custom functions in data query expressions .

Meta-programming summary

From these examples, you can see that F# meta-programming is more flexible than the lambda expressions available in C# 3.0. The data representation of code may look more complex than the representation used in LINQ; however, it is easy to use thanks to its functional design and thanks to the functional features in F#. A very important feature that isn't available in C# 3.0 is that in C# you can get data representation only for an expression, while in F# it is possible to get the code of statements -- including loops and statement blocks -- as well as function declarations. A second feature, which is not turned on in the compiler by default, is that in F# you can get a data representation of any top-level definition, i.e. all declared functions. This makes it possible to expand function calls in expressions.

References

The LINQ Project. D. Box and A. Hejlsberg.
See http://msdn.microsoft.com/data/ref/linq/ [^]
Cω language. E. Meijer, W. Schulte, G. Bierman, and others.
See http://research.microsoft.com/Comega/ [^]
F# language. D. Syme.
See http://research.microsoft.com/fsharp/ [^]
Language Workbenches: The Killer-App for Domain Specific Languages? Marin Fowler.
See http://martinfowler.com/articles/languageWorkbench.html [^]
Concepts in Programming Languages. John C. Mitchell.
Cambridge University Press, Cambridge UK, 2003
ILX project. D. Syme.
See http://research.microsoft.com/projects/ilx/ilx.aspx [^]

History

15 October, 2006 - Original version posted
3 March, 2007 - Article updated
30 May, 2007 - Article edited and posted to the main CodeProject.com article base