Introduction
The Beta for Visual Studio 2010 is upon us and included is the CTP of C# 4.0. While C# 4.0 does not represent a radical departure from the previous version, there are some key features that should be understood thoroughly in order to take advantage of their true potential.
Background
The white paper for C# 4.0's features does a good job of explaining the changes in the language. I thought, however, that some larger code samples and historical perspective would help people (especially new developers) in understanding why things have changed.
Feature Categories
Microsoft breaks the new features into the following four categories so I will maintain the pattern:
- Named and Optional Parameters
- Dynamic Support
- Variance
- COM Interop
Conventions
Some of the examples assume the following classes are defined:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
public class Customer : Person
{
public int CustomerId { get; set; }
public void Process() { ... }
}
public class SalesRep : Person
{
public int SalesRepId { get; set; }
public void SellStuff() { ... }
}
Named and Optional Parameters
We'll start off with one of the easier features to explain. In fact, if you have ever used Visual Basic, then you are probably already familiar with it.
Optional Parameters
Support for optional parameters allows you to give a method parameter a default value so that you do not have to specify it every time you call the method. This comes in handy when you have overloaded methods that are chained together.
The Old Way
public void Process( string data )
{
Process( data, false );
}
public void Process( string data, bool ignoreWS )
{
Process( data, ignoreWS, null );
}
public void Process( string data, bool ignoreWS, ArrayList moreData )
{
}
The reason for overloading Process
in this way is to avoid always having to include "false, null
" in the third method call. Suppose 99% of the time there will not be 'moreData
' provided. It seems ridiculous to type and pass null
so many times.
Process( "foo", false, null );
Process( "foo", false );
Process( "foo" );
The New Way
public void Process( string data, bool ignoreWS = false, ArrayList moreData = null )
{
}
Now we have one method instead of three, but the three ways we called Process
above are still valid and still equivalent.
ArrayList myArrayList = new ArrayList();
Process( "foo" );
Process( "foo", true );
Process( "foo", false, myArrayList );
Process( "foo", myArrayList );
Awesome, one less thing VB programmers can brag about having to themselves. I haven't mentioned it up to this point, but Microsoft has explicitly declared that VB and C# will be "co-evolving" so the number of disparate features is guaranteed to shrink over time. I would like to think this will render the VB vs. C# question moot, but I'm sure people will still find a way to argue about it. ;-)
Named Parameters
In the last example, we saw that the following call was invalid:
Process( "foo", myArrayList );
But if the boolean ignoreWS
is optional, why can't we just omit it? Well, one reason is for readability and maintainability, but primarily because it can become impossible to know what parameter you are specifying. If you had two parameters of the same type, or if one of the parameters was "object
" or some other base class or interface, the compiler would not know which parameter you are sending. Imagine a method with ten optional parameters and you give it a single ArrayList
. Since an ArrayList
is also an object
, an IList
, and an IEnumerable
, it is impossible to determine how to use it. Yes, the compiler could just pick the first valid option for each parameter (or a more complex system could be used), but this would become impossible for people to maintain and would cause countless programming mistakes.
Named parameters provide the solution:
ArrayList myArrayList = new ArrayList();
Process( "foo", true );
Process( "foo", true, myArrayList );
Process( "foo", moreData: myArrayList);
Process( "foo", moreData: myArrayList, ignoreWS: false );
As long as a parameter has a default value, it can be omitted, and you can just supply the parameters you want via their name. Note in the second line above, the 'true
' value for ignoreWS
did not have to be named since it is the next logical parameter.
Dynamic Support
OK, I'm sure we all have had to deal with code similar to the following:
public object GetCustomer()
{
Customer cust = new Customer();
...
return cust;
}
...
Customer cust = GetCustomer() as Customer;
if( cust != null )
{
cust.FirstName = "foo";
}
Note the GetCustomer
method returns object
instead of Customer
. Code like this is frustrating because you know it returns a Customer
; it always has and it always will. Unfortunately, the coder chose to return object
and you can't change it because it modifies the public contract and could potentially break legacy software.
Another instance in which you will be dealing with an object that you know is another type is Reflection.
Type myType = typeof( Customer );
ConstructorInfo consInfo = myType.GetContructor(new Type[]{});
object cust = consInfo.Invoke(new object[]{});
((Customer)cust).FirstName = "foo";
Because Reflection can act on any type, ConstructorInfo.Invoke()
must return object
. Like the first example, this forces you to cast the object. Now, consider the situation where you can't, or don't want to, cast the object. Perhaps, the code author is always changing the name of the type or creating different versions (e.g., 'Customer2
'), but the properties and methods stay the same. The examples above assume you, as the programmer, have knowledge of what the true type is. What if you didn't? What if you had to use Reflection to find and invoke methods? What if the object being returned was coming from IronPython, JavaScript, COM, or some other non-statically typed environment?
Enter 'dynamic'
The dynamic
keyword is new to C# 4.0, and is used to tell the compiler that a variable's type can change or that it is not known until runtime. Think of it as being able to interact with an Object
without having to cast it.
dynamic cust = GetCustomer();
cust.FirstName = "foo";
cust.Process();
cust.MissingMethod();
Notice we did not need to cast nor declare cust
as type Customer
. Because we declared it dynamic
, the runtime takes over and then searches and sets the FirstName
property for us. Now, of course, when you are using a dynamic variable, you are giving up compiler type checking. This means the call cust.MissingMethod()
will compile and not fail until runtime. The result of this operation is a RuntimeBinderException
because MissingMethod
is not defined on the Customer
class.
The example above shows how dynamic
works when calling methods and properties. Another powerful (and potentially dangerous) feature is being able to reuse variables for different types of data. I'm sure the Python, Ruby, and Perl programmers out there can think of a million ways to take advantage of this, but I've been using C# so long that it just feels "wrong" to me.
dynamic foo = 123;
foo = "bar";
OK, so you most likely will not be writing code like the above very often. There may be times, however, when variable reuse can come in handy or clean up a dirty piece of legacy code. One simple case I run into often is constantly having to cast between decimal
and double
.
decimal foo = GetDecimalValue();
foo = foo / 2.5;
foo = Math.Sqrt(foo);
string bar = foo.ToString("c");
The second line does not compile because 2.5 is typed as a double
and line 3 does not compile because Math.Sqrt
expects a double
. Obviously, all you have to do is cast and/or change your variable type, but there may be situations where dynamic
makes sense to use.
dynamic foo = GetDecimalValue();
foo = foo / 2.5;
foo = Math.Sqrt(foo);
string bar = foo.ToString("c");
Update
After some great questions and feedback, I realized I need to clarify a couple points I made above. When you use the dynamic
keyword, you are invoking the new Dynamic Language Runtime libraries (DLR) in the .NET framework. There is plenty of information about the DLR out there, and I am not covering it in this article. Also, when possible, you should always cast your objects and take advantage of type checking. The examples above were meant to show how dynamic
works and how you can create an example to test it. Over time, I'm sure best practices will emerge; I am making no attempt to create recommendations on the use of the DLR or dynamic
.
Also, since publishing the initial version of this article, I have learned that if the object you declared as dynamic
is a plain CLR object, Reflection will be used to locate members and not the DLR. Again, I am not attempting to make a deep dive into this subject, so please check other information sources if this interests you.
Switching Between Static and Dynamic
It should be apparent that 'switching' an object from being statically typed to dynamic is easy. After all, how hard is it to 'lose' information? Well, it turns out that going from dynamic to static is just as easy.
Customer cust = new Customer();
dynamic dynCust = cust;
dynCust.FirstName = "foo";
Customer newCustRef = dynCust;
Person person = dynCust;
SalesRep rep = dynCust;
Note that in the example above, no matter how many different ways we reference it, we only have one Customer
object (cust
).
Functions
When you return something from a dynamic function call, indexer, etc., the result is always dynamic. Note that you can, of course, cast the result to a known type, but the object still starts out dynamic.
dynamic cust = GetCustomer();
string first = cust.FirstName;
dynamic id = cust.CustomerId;
object last = cust.LastName;
There are, of course, a few missing features when it comes to dynamic types. Among them are:
- Extension methods are not supported
- Anonymous functions cannot be used as parameters
We will have to wait for the final version to see what other features get added or removed.
Variance
OK, a quick quiz. Is the following legal in .NET?
IList<string> strings = new List<string>();
IList<object> objects = strings;
I think most of us, at first, would answer 'yes' because a string
is an object
. But the question we should be asking ourselves is: Is a -list- of string
s a -list- of object
s? To take it further: Is a -strongly typed- list of string
s a -strongly typed- list of object
s? When phrased that way, it's easier to understand why the answer to the question is 'no'. If the above example was legal, that means the following line would compile:
objects.Add(123);
Oops, we just inserted the integer value 123
into a List<string>
. Remember, the list contents were never copied; we simply have two references to the same list. There is a case, however, when casting the list, this should be allowed. If the list is read-only, then we should be allowed to view the contents any (type legal) way we want.
Co and Contra Variance
From Wikipedia:
Within the type system of a programming language, a type conversion operator is:
- covariant if it preserves the ordering, =, of types, which orders types from more specific to more generic;
- contravariant if it reverses this ordering, which orders types from more generic to more specific;
- invariant if neither of these apply.
C# is, of course, covariant, meaning a Customer
is a Person
and can always be referenced as one. There are lots of discussions on this topic, and I will not cover it here. The changes in C# 4.0 only involve typed (generic) interfaces and delegates in situations like in the example above. In order to support co and contra variance, typed interfaces are going to be given 'input' and 'output' sides. So, to make the example above legal, IList
must be declared in the following manner:
public interface IList<out T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
...
}
Notice the use of the out
keyword. This is essentially saying the IList
is readonly and it is safe to refer to a List<string>
as a List<object>
. Now, of course, IList
is not going to be defined this way; it must support having items added to it. A better example to consider is IEnumerable
which should be, and is, readonly.
public interface IEnumerable<out T> : IEnumerable
{
IEnumerator<T> GetEnumerator();
}
Using out
to basically mean 'read only' is straightforward, but when does using the in
keyword to make something 'write only' useful? Well, it actually becomes useful in situations where a generic argument is expected and only used internally by the method. IComparer
is the canonical example.
public interface IComparer<in T>
{
public int Compare(T left, T right);
}
As you can see, we can't get back an item of type T
. Even though the Compare
method could potentially act on the left and right arguments, it is kept within the method so it is a 'black hole' to clients that use the interface.
To continue the example above, this means that an IComparer<object>
can be used in the place of an IComparer<string>
. The C# 4.0 whitepaper sums the reason up nicely: 'If a comparer can compare any two objects, it can certainly also compare two string
s'. This is counter-intuitive (or maybe contra-intuitive) because if a method expects a string
, you can't give it an object
.
Putting it Together
OK, comparing string
s and object
s is great, but I think a somewhat realistic example might help clarify how the new variance keywords are used. This first example demonstrates the effects of the redefined IEnumerable
interface in C# 4.0. In .NET 3.5, line 3 below does not compile with an the error: 'can not convert List<Customer> to List<Person>'. As stated above, this seems 'wrong' because a Customer
is a Person
. In .NET 4.0, however, this exact same code compiles without any changes because IEnumerable
is now defined with the out
modifier.
MyInterface<Customer> customers = new MyClass<Customer>();
List<Person> people = new List<Person>();
people.AddRange(customers.GetAllTs());
people.Add(customers.GetAllTs()[0]);
...
interface MyInterface<T>
{
List<T> GetAllTs();
}
public class MyClass<T> : MyInterface<T>
{
public List<T> GetAllTs()
{
return _data;
}
private List<T> _data = new List<T>();
}
This next example demonstrates how you can take advantage of the out
keyword. In .NET 3.5, line 3 compiles, but line 4 does not with the same 'cannot convert' error. To make this work in .NET 4.0, simply change the declaration of MyInterface
to interface MyInterface<out T>
. Notice that in line 4, T
is Person
, but we are passing the Customer
version of the class and interface.
MyInterface<Person> people = new MyClass<Person>();
MyInterface<Customer> customers = new MyClass<Customer>();
FooClass<Person>.GetThirdItem(people);
FooClass<Person>.GetThirdItem(customers);
...
public class FooClass<T>
{
public static T GetThirdItem(MyInterface<T> foo)
{
return foo.GetItemAt(2);
}
}
public interface MyInterface<out T>
{
T GetItemAt(int index);
}
public class MyClass<T> : MyInterface<T>
{
public T GetItemAt(int index)
{
return _data[index];
}
private List<T> _data = new List<T>();
}
This final example demonstrates the wacky logic of contravariance. Notice that we put a SalesRep
'inside' our Person
interface. This isn't a problem because a SalesRep
is a Person
. Where it gets interesting is when we pass the MyInterface<Person>
to FooClass<Customer>
. In essence, we have 'inserted' a SalesRep
into an interface declared to work with only Customer
s! In .NET 3.5, line 5 does not compile; as expected. By adding the in
keyword to our interface declaration in .NET 4.0, everything works fine because we are 'agreeing' to treat everything as a Person
internally and not expose the internal data (which might be that SalesRep
).
MyInterface<Customer> customer = new MyClass<Customer>();
MyInterface<Person> person = new MyClass<Person>();
person.SetItem(new SalesRep());
FooClass<Customer>.Process(customer);
FooClass<Customer>.Process(person);
...
public class FooClass<T>
{
public static void Process(MyInterface<T> obj)
{
}
}
public interface MyInterface<in T>
{
void SetItem(T obj);
void Copy(T obj);
}
public class MyClass<T> : MyInterface<T>
{
public void SetItem(T obj)
{
_item = obj;
}
private T _item;
public void Copy(T obj)
{
}
}
COM Interop
This is by far the area in which I have the least experience; however, I'm sure we have all had to interact with Microsoft Office at one point and make calls like this:
using Microsoft.Office.Interop;
using Microsoft.Office.Interop.Word;
object foo = "MyFile.txt";
object bar = Missing.Value;
object optional = Missing.Value;
Document doc = (Document)Application.GetDocument(ref foo, ref bar, ref optional);
doc.CheckSpelling(ref optional, ref optional, ref optional, ref optional);
There are (at least) three problems with the code above. First, you have to declare all your variables as object
s and pass them with the ref
keyword. Second, you can't omit parameters and must also pass the Missing.Value
even if you are not using the parameter. And third, behind the scenes, you are using huge (in file size) interop assemblies just to make one method call.
C# 4.0 will allow you to write the code above in a much simpler form that ends up looking almost exactly like 'normal' C# code. This is accomplished by using some of the features already discussed; namely dynamic support and optional parameters.
using Microsoft.Office.Interop.Word;
var doc = Application.GetDocument("MyFile.txt");
doc.CheckSpelling();
What will also happen behind the scenes is that the interop assembly that is generated will only include the interop code you are actually using in your application. This will cut down on application size tremendously. My apologies in advance for this weak COM example, but I hope it got the point across.
Conclusion
There are some great enchantments coming in C# 4.0. This article was intended to provide an overview of the new features and why they were created. There may be some last minute tweaks to the final product, but the features above are coming and should make a big difference in your future development.
History
- 7/1/2009: Initial version.
- 8/12/2009: Added more
dynamic
and variance samples.