This blog can't be viewed on LiveJournal. Instead see http://www.apparently.me.uk/4242.html.

Lambda expressions and expression trees in .NET and C#

28th Jan 2007

One of the upcoming features of Microsoft's .NET framework is the System.Query assembly, where you'll find the Language INtegrated Query (LINQ) support. The matching new version of C# has gained lots of new syntactic sugar to make the query bits easier to use.

Firstly, they've added lambda expressions as a variation on the anonymous delegate support from C# 2.0. Lambda expressions compile a little differently and have a much less unweildy syntax:

var expr = (int x) => x - 5

The above causes expr to contain an anonymous function which subtracts five from any integer passed to it. One thing you'll also note in the above is that C# now has type inference, so for local variables you don't need to explicitly label the type; the compiler figures out what it should be based on what is done with the variable. The actual type of expr in this case is System.Query.Func<int,int>, which is a subclass of Delegate.

However, another eagerly-awaited feature of LINQ is the ability to use C# language syntax to query databases. I won't go into any detail about this here since this entry isn't about LINQ per se, but suffice it to say that the system takes lamba expressions written in the code and somehow turns them into SQL expressions in a SELECT query. How is this done? I was intriegued when I saw this being demonstrated on Channel 9, since it seemed to me that in order to do this magic trick the runtime would have to have access to the parse tree from the compiler. It turns out that this is pretty-much exactly how it works!

Expression trees are another feature added to support LINQ, and are again based on the lambda expression syntax. Let's adjust the above code a little bit:

Expression<Func<int,int>> expr = (int x) => x - 5;

Note that this time I've explicitly given the type of my variable expr. Rather than just Func<int,int> as before, I've used Expression<Func<int,int>>. This type has a special meaning to the compiler, which causes it to generate an expression tree rather than an anonymous delegate as before. An expression tree is basically just a representation of the parse tree built by the compiler that is available at runtime. The above code is syntactic sugar for something like the following:

ParameterExpression param = Expression.Parameter(typeof(int), "x"));
ConstantExpression five = Expression.Constant(5, typeof(int));
BinaryExpression subtract = Expression.Subtract(param, five);
Expression<Func<int,int>> expr = Expression.Lambda<Func<int,int>>(subtract);

At runtime, you can call a method on expr to compile it to IL and get back an anonymous delegate. You can also inspect the tree in your own code and transform it into other forms such as SQL expressions, though you will have to prepare an appropriate transformation for over 70 different expression types to support the full set of syntax constructs allowed in lambda expressions, and do appropriate pre- and post-processing to support the use of .NET methods and constructors against the values in the database.

Since these expression trees exist at runtime, you can theoretically build them dynamically too. If you write a parser for a simple expression language of your own invention, you can build an expression tree yourself and compile it into IL at runtime in order to run it.

Unfortunately, as with almost all nifty abstraction layers, it does leak a bit if you push it too hard:

    error CS1802: The expression contains operations that cannot be translated
                 to a System.Expressions.Expression type

There's still plenty of scope for clever approaches to solving problems with this mechanism, though. It's good to see this sort of thing coming into not one but two mainstream languages. Presumably other languages targetting .NET will soon follow suit, and the Mono guys are already working on adding this stuff to their class library and compilers.

Comments