Background

The introduction of Language-Integrated Query (LINQ) was one of the big innovations, released as part of the .NET Framework 3.5 back in 2006. LINQ is a domain specific language – syntactical sugar – build into the .NET languages and frameworks to support querying and processing data from different sources in the same way.

Basically two generic interfaces exist that allow for LINQ functionality to be built on: IEnumerable<T> is used for in-memory data processing whereas IQueryable<T> is providing the LINQ interface for arbitrary data sources.

LINQ to Objects, LINQ to XML, and LINQ to DataSet are Microsoft’s implementations for processing in-memory data collections. LINQ to SQL (SQL Server only) and LINQ to Entity (Entity Framework with its own provider model for arbitrary relational database products) for querying relational data are Microsoft’s implementations of IQueryable<T> for remote data sources.

Numerous additional LINQ providers from different vendors and open source projects exist, for a variety of different data sources such as RDBMS, OLAB cubes, XML, XML Schema, CSV, Google, Twitter, and Wikipedia to name a few.

However, each of the existing LINQ providers may be used only if direct access to the corresponding data source is possible from the given application.

How does Linq Integrate with N-Tier Architectures?

To answer this question we first need to look behind the scenes in order to understand the abilities and limitations of the system.

Remote vs. Local

IEnumerable<T> is the .NET framework’s basic interface for local, i.e. in-memory data collections and specifies a single method to provide an enumerator for iterating over the collection’s elements.

Remote data collections may be represented by implementing the IQueryable<T> interface which is an extension of IEnumerable<T> and specifies three properties: the ElementType defines the type return by the remote collection, the Expression of type System.Linq.Expressions.Expression represents the generic query being used to get the data from the remote data source, and the Provider which is responsible for the actual execution of the query.

Actual LINQ functionality is provided for these two interfaces types via extension methods. An equal set of extension methods exists for each of the two types that allow for filtering, sorting, paging, grouping, projecting, etc. The difference for local vs. remote data processing is made through the parameter types used in each set of extension methods. Let’s take the example of the Where method which is used for filtering data:

public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate
)

public static IQueryable<TSource> Where<TSource>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, bool>> predicate
)

The extension method for IEnumerable<T> takes a predicate of type Func<TSource, bool> which is simply a delegate – a method handle so to speak – which is may be invoked locally for each item to examine whether it’s being included or filtered out.

The predicate parameter of the IQueryable<T> cousin of the Where extension method is of type Expression<Func<TSource, bool>>. Although it looks quite similar, this isn’t a delegate but a so called expression tree. An expression tree is a data structure describing execution logic or program code. These trees may be traversed and examined to extract the logic of the given filter predicate in this example.

What’s the Problem?

By implementing a LINQ provider for a specific data source, these expression trees need to be parsed and translated to the query language to be applied for the corresponding data source.

From a usage perspective this concept is extremely simple as all additional work to build up the expression tree is done by the compiler and the invocation of both versions of the method is exactly the same.

On the implementing side, however, it’s a different story! Expression trees can get complex and parsing them isn’t a trivial task.

Now, as numerous LINQ providers for different kind of data sources already exist, there is a good chance you don’t need to implement your own. However, what happens in an n-tier scenario? Let’s say we have a client, server, and a database. The client connects to the server via WCF and may request data by calling specific service methods. The server queries the database using Entity Framework and returns the data to the client. LINQ can be used easily on the server side, since Entity Framework allows querying the relational database using LINQ. However, on client side we have don’t have this comfort. While we may use LINQ to Object on the client for in-memory data processing we cannot use LINQ to request data from the server as there is no LINQ provider for our custom WCF service available. What’s more, implementing a LINQ provider for the custom WCF service is not only impractical as it would mean a lot of effort but also as the service is most likely not meant to support complex queries that include grouping, joining, or projecting data dynamically on client’s request. Usually such a service returns data of a known type allowing the client to specify filter criteria and maybe sorting and paging.

Given the complexity for implementing a full-fledged LINQ provider and since expression trees cannot be serialized – and therefore cannot be transported from client to server – by nature, we’re given an interesting problem to be solved.

A simple Library does the Job

Let’s split our problem into two parts, the creation of a LINQ-like API on the client side, and the transport of query parameters to the server.

LINQified API for Service Consumers

What we want is a LINQ-like API on the client side, allowing us to specify query arguments which are passed to service methods when retrieving data from the server. In most circumstances these queries don’t need to be arbitrary complex and it’s sufficient to allow for “remote operations” such as filtering, sorting and paging.

Therefore it was exaggerated to implement the IQueryable<T> which is fairly complex and would pretend to support way more LINQ operations than what we actually can provide. Instead it’s a common approach to implement a custom collection type which provides its own LINQ functions either via extension methods or via proper member methods:

public class MyCollection<T> : IEnumerable<T>
{
    // ... implementation details omitted for brevity

    public MyCollection<T> Where(Expression<Func<T, bool>> predicate)
    {
        // ... implementation details omitted for brevity
    }
}

Given this approach, instead of a collection we could implement a query type which provides a LINQ-like API. A query instance allows developers to chain query operations and simply collects the LINQ expressions passed to its methods:

public class Query<T>
{
    public readonly List<Expression<Func<T, bool>>> Filters =
        new List<Remote.Linq.Expressions.LambdaExpression>();

    public Query<T> Where(Expression<Func<T, bool>> predicate)
    {
        var query = new Query<T>();
        query.Filters.AddRange(this.Filters);
        query.Filters.Add(predicate);
        return query;
    }

    // additional LINQ functions...
}

Serializing Expression Trees

Given the nature of LINQ expression trees, they cannot be serialized as they are not only representations of simple data queries but arbitrary program code. However, this doesn’t mean that the information of our usual queries cannot be serialized and transmitted from client to server. This information just has to be extracted from the expression trees and translated into a serializable format.

The translation of an expression tree into a serializable format can easily be done using a small library I’ve published as open source project on CodePlex called Remote Linq.

The code to invoke translation of LINQ expressions to remote expressions and back can be wrapped in the custom query class. At the same time this class may serve as data contract to be passed as parameter to the service method.

Eventually – with only very little effort! – you end-up writing quite concise code lines on client side to retrieve data from the server.

Example:

Query<Order> orderQuery = serviceProxy
    .CreateQuery<Order>(service => service.GetOrders)
    .Where(order => order.Items.Any(orderItem => orderItem.ProductId == productId));

List<Order> orders = orderQuery.ToList();

The example above retrieves orders from the server – at the time ToList is invoked – by calling the GetOrders service method; and further lets the server only return orders which include one or more order items for the specified product id.

Code on server side is equally simple as remote expressions can be translated back to LINQ expression trees by a single methods call and can then be applied to any IEnumerable<T> or IQueryable<T> data source.

 

Attached to this blog post you find a Visual Studio 2012 solution with a client server scenario, demonstrating what I’ve described above.

UPDATE: Remote Linq version 1.0 was just released. It includes a query type which supports filtering, sorting, and paging. You may get the lastest version from CodePlex.