Thursday, February 19, 2015

LINQ Queries

Queries

In the LINQ package is added a set of predefined operators (queries) that enable developers to create SQL-like queries on a collection of objects. These queries return new collections of data according to the query conditions. Queries are used in the following form:
from <<element>> in <<collection>>
   where <<expression>> 
   select <<expression>>
As a result of the query, a generic collection (IEnumerable<T>) is returned. Type <T> in the generic collection is determined using the type of expression in the select <<expression>> part of the query. An example of a LINQ query that returns book titles for books with prices less than 500 is shown in the following listing:
from book in myBooks
    where book.Price < 500
    select book.Title
This query goes through a collection of books myBooks that takes book entities which have a price property less than 500, and for each title, returns the title. The result of the query is an object IEnumerable<String> becauseString is a type of return expression in the select query (the assumption is that the Title property of a book is string).

Functions

LINQ adds a set of useful function that can be applied to collections. Functions are added as new methods of collection objects and can be used in the following form:
  • <collectionObject>.methodname()
  • <collectionObject>.methodname(<collectionObject>)
  • <collectionObject>.methodname(<<expression>>)
All LINQ queries are executed on the collections of type IEnumerable<T> or IQueryable<T>, and as results are returned as new collections (IEnumerable<T>), objects, or simple types. Examples of LINQ queries or functions that return collections are:
IEnumerable<T> myBooks = allBooks.Except(otherBooks); 
IEnumerable<string> titles = myBooks.Where(book=>book.Price<500)
                                    .Select(book=>book.Title);
int count = titles.Count();
First of these three functions creates a new collection where are placed all books expect the ones that are in theotherBooks collection. The second function takes all books with prices less than 500, and returns a list of their titles. The third function finds a number of titles in the titles collection. This code shows the usage of each of the three LINQ functions shown above.
Note that there is a dual form of the LINQ queries and functions. For most of the queries you can write equivalent function form. Example is shown in the following code:
from book in myBooks
            where book.Price < 500
            select book.Title
myBooks
       .Where(book=>book.Price<500)
       .Select(book=>book.Title);
In the following sections can be found more examples about LINQ to collections.

Dynamic LINQ

As explained above, LINQ queries and functions return either classes or collections of classes. In most cases, you will use existing domain classes (Book, Author, Publisher, etc.) in the return types of queries. However, in some cases you might want to return custom classes that do not exist in your class model. As an example, you might want to return only the ISBN and title of the book, and therefore you do not want to return an entire Book object with all properties that you do not need at all. A similar problem will be if you want to return fields from different classes (e.g., if you want to return the title of the book and name of publisher).
In this case, you do not need to define new classes that contain only fields you want to use as return values. LINQ enables you to return so called "anonymous classes" - dynamically created classes that do not need to be explicitly defined in your class model. An example of such a kind of query is shown in the following example:
var items = from b in books
select new { Title: b.Title,
             ISBN: b.ISBN
           };
The variable items is a collection of dynamically created classes where each object has Title and ISBNproperties. This class is not explicitly defined in your class model - it is created dynamically for this query. If you try to find the type of variable items, you will probably see something like IEnumerable<a'> - the .NET Framework gives some dynamic name to the class (e.g., a'). This way you can use temporary classes just for the results of queries without the need to define them in the code.
Many people think that this is a bad practice because we have used objects here without type. This is not true - the items variable does have a type, however the type is not defined in some file. However, you have everything you need from the typed object:
  • Compile-time syntax check - if you make some error in typing (e.g., put ISBN instead of ISBN), the compiler will show you warning and abort compilation.
  • Intellisense support - Visual Studio will show you a list of properties/methods of the object as it does for regular objects.
However, there is a way to use untyped objects in .NET. If you replace the keyword var with the keyworddynamic, the variable items will be truly untyped. An example is shown in the following listing:
dynamic items = from b in books
        select new { Title: b.Title,
                     ISBN: b.ISBN
                   };
In this case you have a true untyped object - there will be no compile-time check (properties will be validated at run-time only) and you will not have any Intellisense support for dynamic objects.
Although var is better than dynamic (always use var where it is possible), there are some cases where you will be forced to use dynamic instead of var. As an example, if you want to return the result of some LINQ query as a return value of a method you cannot declare the return type of the method as var because the scope of the anonymous class ends in the method body. In that case you will need to either define an explicit class or declare the return value as dynamic.
In this article I will use either explicit or anonymous classes.

Lambda Expressions

While you are working with LINQ, you will find some "weird syntax" in the form x => y. If you are not familiar with this syntax, I will explain it shortly.
In each LINQ function you will need to define some condition that will be used to filter objects. The most natural way to do this is to pass some function that will be evaluated, and if an object satisfies a condition defined in the function, it will be included in the result set of the LINQ function. That kind of condition function will need to take an object and return a true/false value that will tell LINQ whether or not this object should be included in the result set. An example of that kind of function that checks if the book is cheap is shown in the following listing:
public bool IsCheapBook(Book b)
{
    return (b.Price < 10);
}
If the book price is less than 10, it is cheap. Now when you have this condition, you can use it in the LINQ clause:
var condition = new Func<Book, bool>(IsBookCheap);
var cheapBooks = books.Where(condition);
In this code we have defined a "function pointer" to the function IsBookCheap in the form Func<Book, bool>, and this function is passed to the LINQ query. LINQ will evaluate this function for each book object in the books collection and return a book in the resulting enumeration if it satisfies the condition defined in the function. 
This is not a common practice because conditions are more dynamic and it is unlikely that you can create a set of precompiled functions somewhere in the code, and they will be used by all LINQ queries. Usually we need one expression per LINQ query so it is better to dynamically generate and pass a condition to LINQ. Fortunately C# allows us to do this using delegates:
var cheapBooks = books.Where(delegate(Book b){ return b.Price < 10; } );
In this example, I have dynamically created Function<Book, bool>, and put it directly in the Where( )condition. The result is the same as in the previous example but you do not need to define a separate function for this.  
If you think that this is too much typing for a simple inline function, there is a shorter syntax - lambda expressions. First you can see that we don't need the delegate word (the compiler should know that we need to pass a delegate as an argument). Also, why do we need to define the type of the argument (Book b)? As we are applying this function to the collection of books, we know that b is a Book - therefore we can remove this part too. Also, why should we type return - an expression that defines the return condition will be enough. The only thing we would need to have is a separator that will be placed between the argument and the expression that will be returned - in C#, we use => symbol.
When we remove all the unnecessary stuff and put a separator =>, we are getting a lambda expression syntax in the form argument => expression. An original delegate and equivalent lambda expression replacement is shown in the following example:
Funct<Book, bool> delegateFunction = delegate(Book b){ return b.Price < 10; } ;
Funct<Book, bool> lambdaExpression = b => b.Price< 10 ;
As you can see, a lambda expression is just a minimized syntax for inline functions. Note that we can use lambda expressions for any kind of function (not only functions that return bool values). As an example, you can define a lambda expression that takes a book and author, and returns a book title in the format book "title (author name)". An example of that kind of lambda expression is shown in the following listing: 
Func<Book, Author, string> format = (book, author) => book.Title + "(" + author.Name + ")";
This function will take two arguments (Book and Author), and return a string as a result (the last type in theFunc<> object is always the return type). In the lambda expression are defined two arguments in the brackets and the string expression that will be returned.  
Lambda expressions are widely used in LINQ, so you should get used to them.

Class model used in this article

In the examples, we will use a data structure that represents information about books, their authors, and publishers. The class diagram for that kind of data structure is shown on the figure below:
LINQ-Queries-Overview/Linq2EntitiesSampleDiagram.gif
Each book can have several authors and one publisher. The fields associated to entities are shown on the diagram. Book has information about ISBN, price, number of pages, publication date, and title. Also, it has a reference to a publisher, and a reference to a set of authors. Author has a first name and last name without reference back to a book, and publisher has just a name without reference to books he published.
There will be the assumption that a collections of books, publishers, and authors are placed in the SampleData.Books, SampleData.Publishers, and SampleData.Authors fields.

LINQ queries

In this section I will show some examples of basic queries/functions that can be used. If you are a beginner this should be a good starting point for you.

Basic query

The following example shows the basic usage of LINQ. In order to use a LINQ to Entities query, you will need to have a collection (e.g., array of books) that will be queried. In this basic example, you need to specify what collection will be queried ("from <<element>> in <<collection>>" part) and what data will be selected in the query ("select <<expression>>" part). In the example below, the query is executed against a books array, book entities are selected, and returned as result of queries. The result of the query is IEnumerable<Book> because the type of the expression in the "'select << expression>>" part is the class Book.
Book[] books = SampleData.Books;
IEnumerable<Book> bookCollection = from b in books
                                   select b;

foreach (Book book in bookCollection )
         Console.WriteLine("Book - {0}", book.Title);
As you might have noticed, this query does nothing useful - it just selects all books from the book collection and puts them in the enumeration. However, it shows the basic usage of the LINQ queries. In the following examples you can find more useful queries.

Projection/Selection of fields

Using LINQ, developers can transform a collection and create new collections where elements contain just some fields. The following code shows how you can create a collection of book titles extracted from a collection of books.
Book[] books = SampleData.Books;            
IEnumerable<string> titles = from b in books
                             select b.Title;

foreach (string t in titles)
    Console.WriteLine("Book title - {0}", t);
As a result of this query, IEnumerable<string> is created because the type of expression in the select part is string. An equivalent example written as a select function and lambda expression is shown in the following code:
Book[] books = SampleData.Books;            
IEnumerable<string> titles = books.Select(b=>b.Title);

foreach (string t in titles)
    Console.WriteLine("Book title - {0}", t);
Any type can be used as a result collection type. In the following example, an enumeration of anonymous classes is returned, where each element in the enumeration has references to the book and the first author of the book:
var bookAuthorCollection = from b in books
                   select new { Book: b,
                                Author: b.Authors[0]
                              };
    
foreach (var x in bookAuthorCollection)
    Console.WriteLine("Book title - {0}, First author {1}", 
                         x.Book.Title, x.Author.FirstName);
This type of queries are useful when you need to dynamically create a new kind of collection.

Flattening collections returned in a Select query

Imagine that you want to return a collection of authors for a set of books. Using the Select method, this query would look like the one in the following example:
Book[] books = SampleData.Books;            
IEnumerable< List<Author> > authors = books.Select(b=>b.Authors);

foreach (List<Author> bookAuthors in authors)
    bookAuthors.ForEach(author=>Console.WriteLine("Book author {0}", author.Name);
In this example, from the book collection are taken a list of authors for each book. When you use the Selectmethod, it will return an element in the resulting enumeration and each element will have the typeList<Author>, because that is a type of property that is returned in the Select method. As a result, you will need to iterate twice over the collection to display all authors - once to iterate through the enumeration, and then for each list in the enumeration, you will need to iterate again to access each individual author.  
However, in some cases, this is not what you want. You might want to have a single flattened list of authors and not a two level list. In that case, you will need to use SelectMany instead of the Select method as shown in the following example:
Book[] books = SampleData.Books;            
IEnumerable<Author> authors = books.SelectMany(b=>b.Authors);

foreach (Author authors in authors)
    Console.WriteLine("Book author {0}", author.Name); 
The SelectMany method merges all collections returned in the lambda expression into the single flattened list. This way you can easily manipulate the elements of a collection.
Note that in the first example, I have used the ForEach method when I have iterated through the list of authors in order to display them. The ForEach method is not part of LINQ because it is a regular extension method added to the list class. However it is a very useful alternative for compact inline loops (that is probably the reason why many people think that it is part of LINQ). As the ForEach method, it is not part of LINQ, you cannot use it on an enumeration as a regular LINQ method because it is defined as an extension for List<T> and notEnumerable<T> - if you like this method, you will need to convert your enumerable to a list in order to use it.

Sorting entities

Using LINQ, developers can sort entities within a collection. The following code shows how you can take a collection of books, order elements by book publisher name and then by title, and select books in an ordered collection. As a result of the query, you will get an IEnumerable<Book> collection sorted by book publishers and titles.
Book[] books = SampleData.Books;              
IEnumerable<Book> booksByTitle = from b in books
                                 orderby b.Publisher.Name descending, b.Title
                                 select b;

foreach (Book book in booksByTitle)                
    Console.WriteLine("Book - {0}\t-\tPublisher: {1} ",
                       book.Title, book.Publisher.Name );
Alternative code using functions is shown in the following example:
Book[] books = SampleData.Books;              
IEnumerable<Book> booksByTitle = books.OrderByDescending(book=>book.Publisher.Name)
                                      .ThenBy(book=>book.Title);

foreach (Book book in booksByTitle)                
    Console.WriteLine("Book - {0}\t-\tPublisher: {1} ",
                       book.Title, book.Publisher.Name );
This type of queries is useful if you have complex structures where you will need to order an entity using the property of a related entity (in this example, books are ordered by publisher name field which is not placed in the book class at all).

Filtering entities / Restriction

Using LINQ, developers can filter entities from a collection and create a new collection containing just entities that satisfy a certain condition. The following example creates a collection of books containing the word "our" in the title with price less than 500. From the array of books are selected records whose title contains the word "our", price is compared with 500, and these books are selected and returned as members of a new collection. In ''where <<expression>>'' can be used a valid C# boolean expression that uses the fields in a collection, constants, and variables in a scope (i.e., price). The type of the returned collection is IEnumerable<Book>because in the ''select <<expression>>'' part is the selected type Book.
Book[] books = SampleData.Books;            
int price = 500;            
IEnumerable<Book> filteredBooks = from b in books                                         
                                  where b.Title.Contains("our") && b.Price < price
                                  select b;

foreach (Book book in filteredBooks)                
    Console.WriteLine("Book - {0},\t Price {1}", book.Title, book.Price);
As an alternative, the .Where() function can be used as shown in the following example:
Book[] books = SampleData.Books;            
int price = 500;            
IEnumerable<Book> filteredBooks = books.Where(b=> (b.Title.Contains("our") 
                            && b.Price < price) ); 

foreach (Book book in filteredBooks)                
    Console.WriteLine("Book - {0},\t Price {1}", book.Title, book.Price);

Local variables

You can use local variables in LINQ queries in order to improve the readability of your queries. Local variables are created using the let <<localname>> = <<expression>> syntax inside the LINQ query. Once defined, local variables can be used in any part in the LINQ query (e.g., where or select clause). The following example shows how you can select a set of first authors in the books containing the word 'our' in the title, using local variables.
IEnumerable<Author> firstAuthors =  from b in books
                                    let title = b.Title 
                                    let authors = b.Authors
                                    where title.Contains("our")
                                    select authors[0];             

foreach (Author author in firstAuthors)
    Console.WriteLine("Author - {0}, {1}",
                       author.LastName, author.FirstName);
In this example, variables Title and Authors reference the title of the book and the list of authors. It might be easier to reference these items via variables instead of a direct reference.

Collection methods

Using LINQ, you can modify existing collections, or collections created using other LINQ queries. LINQ provides you a set of functions that can be applied to collections. These functions can be grouped into the following types:
  • Set functions - functions that can be used for collection manipulation operations like merging, intersection, reverse ordering, etc.,
  • Element function - functions that can be used to take particular elements from collections,
  • Conversion functions - functions used to convert a type of collection to another,
  • Aggregation functions - SQL-like functions that enable you to find a maximum, sum, or average value of some field in collections,
  • Quantifier functions - used to quickly traverse through a collection.
These functions are described in the following sections.

Set functions

Set operators enable you to manipulate collections and use standard set operations like unions, intersects, etc. LINQ set operators are:
  • Distinct - used to extract distinct elements from a collection,
  • Union - creates a collection that represents the union of two existing collections,
  • Concat - add elements from one collection to another collection,
  • Intersect - creates a collection that contains elements that exist in both collections,
  • Except - creates a collection that contains elements that exist in one, but do not exist in another collection,
  • Reverse - creates a copy of a collection with elements in reversed order,
  • EquallAll - checks whether two collections have the same elements in the same order,
  • Take - this function takes a number of elements from one collection, and places them in a new collection,
  • Skip - this function skips a number of elements in a collection,
Assuming that the booksByTitle and filteredBooks collection are created in previous examples, the following code finds all books in booksByTitle that do not exist in filteredBooks, and reverses their order.
IEnumerable<Book> otherBooks = booksByTitle.Except(filteredBooks);            

otherBooks = otherBooks.Reverse();  

foreach (Book book in otherBooks)
   Console.WriteLine("Other book - {0} ",  book.Title);
In the following example, booksByTitle and filteredBooks are concatenated and the number of elements and number of distinct elements is shown.
IEnumerable<Book> mergedBooks = booksByTitle.Concat(filteredBooks);
Console.WriteLine("Number of elements in merged collection is {0}", mergedBooks.Count());
Console.WriteLine("Number of distinct elements in merged collection is {0}", mergedBooks.Distinct().Count());

Paging example

In this example is shown an example of client side paging using the Skip(int) and Take(int) methods. Assuming that there are ten books per page, the first three pages are skipped using Skip(30) (ten books per page placed on three pages), and all books that should be shown on the fourth page are taken using Take(10). An example code is:
IEnumerable<Book> page4 = booksByTitle.Skip(30).Take(10);            

foreach (Book book in page4)                
    Console.WriteLine("Fourth page - {0} ", book.Title);
There is also an interesting usage of the Skip/Take functions in the SkipWhile/TakeWhile form:
IEnumerable<Book> page1 = booksByTitle.OrderBy(book=>book.Price)            
                                      .SkipWhile(book=>book.Price<100)
                                      .TakeWhile(book=>book.Price<200);
foreach (Book book in page1)                
    Console.WriteLine("Medium price books - {0} ", book.Title);
In this example, books are ordered by price, all books with price less than 100 are skipped, and all books with price less than 200 are returned. This way all books with price between 100 and 200 are found.

Element functions

There are several useful functions that can be applied when you need to extract a particular element from a collection:
  • First - used to find the first element in a collection. Optionally you can pass a condition to this function in order to find the first element that satisfies the condition.
  • FirstOrDefault - used to find the first element in a collection. If that kind of element cannot be found, the default element for that type (e.g., 0 or null) is returned.
  • ElementAt - used to find the element at a specific position.
The following example shows the usage of the FirstOrDefault and ElementAt functions:
Book firstBook = books.FirstOrDefault(b=>b.Price>200);              
Book thirdBook = books.Where(b=>b.Price>200).ElementAt(2);
Note that you can apply functions either on the collection, or on the result of some other LINQ function.

Conversion functions

There are a few conversion functions that enable you to convert the type of one collection to another. Some of these functions are:
  • ToArray - used to convert elements of collection IEnumerable<T> to array of elements <T>.
  • ToList - used to convert elements of collection IEnumerable<T> to list List<T>.
  • ToDictionary - used to convert elements of a collection to a Dictionary. During conversion, keys and values must be specified.
  • OfType - used to extract the elements of the collection IEnumerable<T1> that implements the interface/class T2, and put them in the collection IEnumerable<T2>.
The following example shows the usage of the ToArray and ToList functions:
Book[] arrBooks = books.ToArray();
List<Book> lstBook = books.ToList();
ToDictionary is an interesting method that enables you to quickly index a list by some field. An example of such a kind of query is shown in the following listing:
Dictionary<string, Book> booksByISBN = books.ToDictionary(book => book.ISBN);
Dictionary<string, double> pricesByISBN = books.ToDictionary(    book => book.ISBN, 
                                book=>book.Price);
If you supply just one lambda expression, ToDictionary will use it as a key of new dictionary while the elements will be the objects. You can also supply lambda expressions for both key and value and create a custom dictionary. In the example above, we create a dictionary of books indexed by the ISBN key, and a dictionary of prices indexed by ISBN.

Quantifier functions

In each collection, you can find a number of logical functions that can be used to quickly travel through a collection and check for some condition. As an example, some of the functions you can use are:
  • Any - checks whether any of the elements in the collection satisfies a certain condition.
  • All - checks whether all elements in the collection satisfies a certain condition.
An example of usage of functions is shown in the following example:
if(list.Any(book=>book.Price<500)) 
    Console.WriteLine("At least one book is cheaper than 500$"); 

if(list.All(book=>book.Price<500))  
    Console.WriteLine("All books are cheaper than 500$");
In the example above, the All and Any functions will check whether the condition that price is less than 500 is satisfied for books in the list.

Aggregation functions

Aggregation functions enable you to perform aggregations on elements of a collection. Aggregation functions that can be used in LINQ are CountSumMinMax, etc.
The following example shows the simple usage of some aggregate functions applied to an array of integers:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

Console.WriteLine("Count of numbers greater than 5 is {0} ", numbers.Count( x=>x>5 ));
Console.WriteLine("Sum of even numbers is {0} ", numbers.Sum( x=>(x%2==0) ));
Console.WriteLine("Minimum odd number is {0} ", numbers.Min( x=>(x%2==1) ));
Console.WriteLine("Maximum is {0} ", numbers.Max());
Console.WriteLine("Average is {0} ", numbers.Average());
As you can see, you can use either standard aggregation functions, or you can preselect a subset using a lambda condition.

Advanced queries

This section shows how you can create advanced queries. These kinds of queries includes joining different collections and using group by operators.

Joining tables

LINQ enables you to use SQL-like joins on a collection of objects. Collections are joined the same way as tables in SQL. The following example shows how you can join three collections publishers, books, and authors, place some restriction conditions in the where section, and print results of the query:
var baCollection = from pub in SampleData.Publishers
                   from book in SampleData.Books
                   from auth in SampleData.Authors
                   where book.Publisher == pub
                      && auth.FirstName.Substring(0, 3) == pub.Name.Substring(0, 3)
                      && book.Price < 500
                      && auth.LastName.StartsWith("G")
                   select new { Book = book, Author = auth};              

foreach (var ba in baCollection)
{    Console.WriteLine("Book {0}\t Author {1} {2}", 
                ba.Book.Title,
                ba.Author.FirstName,
                ba.Author.LastName);
}
This query takes publishers, books, and authors; joins books and publishers via Publisher reference, joins authors and publications by the first three letters of the name. In addition results are filtered by books that have prices less than 500, and authors with name starting with letter "G". As you can see, you can use any condition to join collection entities.

Join operator

LINQ enables you to use thw ''<<collection>> join <<collection>> on <<expression>>'' operator to join two collections on join condition. It is similar to the previous example but you can read queries easily. The following example shows how you can join publishers with their books using a Book.Publisher reference as a join condition.
var book_pub = from p in SampleData.Publishers
                    join b in SampleData.Books  on p equals b.Publisher 
                    into publishers_books
               where p.Name.Contains("Press")
               select new { Publisher = p, Books = publishers_books};             

foreach (var bp in book_pub){
    Console.WriteLine("Publisher - {0}", bp.Publisher.Name);
    foreach (Book book in bp.Books)
        Console.WriteLine("\t Book - {0}", book.Title);
}
A collection of books is attached to each publisher record as a publishers_books property. In the whereclause, you can filter publishers by a condition.
Note that if you are joining objects by references (in the example above, you can see that the join condition is p equals b.Publisher) there is a possibility that you might get an "Object reference not set to the instance objects" exception if the referenced objects are not loaded. Make sure that you have loaded all related objects before you start the query, make sure that you handled null values in the query, or use join conditions by IDs instead of references where possible.

Grouping operator

LINQ enables you to use group by functionality on a collection of objects. The following example shows how you can group books by year when they are published. As a result of the query is returned an enumeration of anonymous classes containing a property (Year) that represents a key used in the grouping, and another property (Books) representing a collection of books published in that year.
var booksByYear = from book in SampleData.Books
               group book by book.PublicationDate.Year
               into groupedByYear
               orderby groupedByYear.Key descending
          select new {
                       Value = groupedByYear.Key,
                       Books = groupedByYear
                      };

foreach (var year in booksByYear){
        Console.WriteLine("Books in year - {0}", year.Value);
        foreach (var b in year.Books)
            Console.WriteLine("Book - {0}", b.Title);
}

Aggregation example

Using LINQ and group by, you can simulate a "select title, count(*) from table" SQL query. The following LINQ query shows how to use LINQ to aggregate data:
var raw = new[] {    new { Title = "first", Stat = 20, Type = "view" },
                     new { Title = "first", Stat = 12, Type = "enquiry" },
                     new { Title = "first", Stat = 0, Type = "click" },
                     new { Title = "second", Stat = 31, Type = "view" },
                     new { Title = "second", Stat = 17, Type = "enquiry" },
                     new { Title = "third", Stat = 23, Type = "view" },
                     new { Title = "third", Stat = 14, Type = "click" }
        };

var groupeddata = from data in raw
                       group data by data.Title
                       into grouped
                  select new {    Title = grouped.Key,
                                  Count = grouped.Count()
                             };

foreach (var data in groupeddata){
    Console.WriteLine("Title = {0}\t Count={1}", data.Title, data.Count);
}

Nested queries

LINQ enables you to use nested queries. Once you select entities from a collection, you can use them as part of an inner query that can be executed on the other collection. As an example, you can see the class diagram above that has class Book that has a reference to the class Publisher, but there is no reverse relationship. Using nested LINQ queries, you can select all publishers in a collection and for each publisher entity, call other LINQ queries that find all books that have a reference to a publisher. An example of such a query is shown below:
var publisherWithBooks = from publisher in SampleData.Publishers
                     select new { Publisher = publisher.Name,
                                  Books =  from book in SampleData.Books
                                           where book.Publisher == publisher
                                           select book
                                 };

foreach (var publisher in publisherWithBooks){
    Console.WriteLine("Publisher - {0}", publisher.Name);
    foreach (Book book in publisher.Books)
        Console.WriteLine("\t Title \t{0}", book.Title);
}
When a new instance is created in a query, for each publisher entity is taken a collection of Books set in the LINQ query and shown on console.
Using local variables you can have a better format for the query as shown in the following example:
var publisherWithBooks = from publisher in SampleData.Publishers
                         let publisherBooks = from book in SampleData.Books
                                              where book.Publisher == publisher
                                              select book
                         select new { Publisher = publisher.Name, 
                                      Books = publisherBooks
                                    };

foreach (var publisher in publisherWithBooks){
    Console.WriteLine("Publisher - {0}", publisher.Name);
    foreach (Book book in publisher.Books)
        Console.WriteLine("\t Title \t{0}", book.Title);
}
In this query, books for the current publisher are placed in the publisherBooks variable, and then is returned an object containing the name of the publisher and his books.
This way you can dynamically create new relationships between entities that do not exist in your original class model.

We’ll be breaking the operators down in to the following area of functionality.
  • Filtering
  • Projecting
  • Joining
  • Ordering
  • Grouping
  • Conversions
  • Sets
  • Aggregation
  • Quantifiers
  • Generation
  • Elements

Filtering Operators

The two filtering operators in LINQ are the Where and OfType operators. An example follows.
ArrayList list = new ArrayList();
list.Add("Dash");
list.Add(new object());
list.Add("Skitty");
list.Add(new object());

var query =
     from name in list.OfType<string>()
     where name == "Dash"
     select name;
The Where operator generally translates to a WHERE clause when working with a relational database and has an associated keyword in C# as shown in the code above. The OfType operator can be used to coax a non-generic collection (like an ArrayList) into a LINQ query. Since ArrayList  does not implementIEnumerable<T>, the OfType operator is the only LINQ operator we can apply to the list.OfType is also useful if you are working with an inheritance hierarchy and only want to select objects of a specific subtype from a collection. This includes scenarios where LINQ to SQL or the Entity Framework are used to model inheritance in the database. Both operators are deferred.

Sorting Operators

The sorting operators are OrderBy, OrderByDescending, ThenBy, ThenByDescending, and Reverse. When working with a comprehension query the C# orderby keyword will map to a corresponding OrderBy method. For example, the following code:
var query =
    from name in names
    orderby name, name.Length
    select name;
… would translate to …
var query =
    names.OrderBy(s => s)
         .ThenBy(s => s.Length);
The return value of the OrderBy operator is an IOrderedEnumerable<T>. This special interface inherits fromIEnumreable<T> and allows the ThenBy and ThenByDescending operators to work as they extendIOrderedEnumerable<T> instead of IEnumerable<T> like the other LINQ operators. This can sometimes create unexpected errors when using the var keyword and type inference. The query variable in the last code snippet above will be typed as IOrderdEnumerable<T>. If you try to add to your query definition later in the same method …
query = names.Where(s => s.Length > 3);
… you’ll find a compiler error when you build that “IEnumerable<string> cannot be assigned toIOrderEnumerable<string>”. In a case like this you might want to forgo the var keyword and explicitly declare query as an IEnumerable<T>.

Set Operations

The set operators in LINQ include Distinct (to remove duplicate values), Except (returns the difference of two sequences), Intersect (returns the intersection of two sequences), and Union (returns the unique elements from two sequences).
int[] twos = { 2, 4, 6, 8, 10 };
int[] threes = { 3, 6, 9, 12, 15 };

// 6 
var intersection = twos.Intersect(threes);

// 2, 4, 8, 10 
var except = twos.Except(threes);

// 2, 4, 7, 8, 10, 3, 9, 12, 15 
var union = twos.Union(threes);
It’s important to understand how the LINQ operators test for equality. Consider the following in-memory collection of employees.
var employees = 
        new List<Employee> {
    new Employee() { ID=1, Name="Scott" },
    new Employee() { ID=2, Name="Poonam" },
    new Employee() { ID=3, Name="Scott"}
};
You might think a query of this collection using the Distinct operator would remove the “duplicate” employee named Scott, but it doesn’t.
// yields a sequence of 3 employees 
var query =
      (from employee in employees
       select employee).Distinct();
This version of the Distinct operator uses the default equality comparer which, for non-value types will test object references. The Distinct operator then will see three distinct object references. Most of the LINQ operators that use an equality test (including Distinct) provide an overloaded version of the method that accepts an IEqualityComparer you can pass to perform custom equality tests.
Another issue to be aware of with LINQ and equality is how the C# compiler generates anonymous types. Anonymous types are built to test if the properties of two objects have the same values (see And Equality For All Anonymous Types). Thus, the following query on the same collection would only return two objects.
// yields a sequence of 2 employees 
var query = (from employee in employees
             select new { employee.ID, employee.Name })
             .Distinct();

Quantification Operators

Quantifiers are the AllAny, and Contains operators.
int[] twos = { 2, 4, 6, 8, 10 };

// true 
bool areAllevenNumbers = twos.All(i => i % 2 == 0); 

// true 
bool containsMultipleOfThree = twos.Any(i => i % 3 == 0);

// false 
bool hasSeven = twos.Contains(7);
Personally, I’ve found these operators invaluable when working with collections of business rules. Here is a simple example:
Employee employee = 
        new Employee { ID = 1, Name = 
            "Poonam", DepartmentID = 1 };

Func<Employee, bool>[] validEmployeeRules = 
{
    e => e.DepartmentID > 0,
    e => !String.IsNullOrEmpty(e.Name),
    e => e.ID > 0
};


bool isValidEmployee =
      validEmployeeRules.All(rule => rule(employee));

Projection Operators

The Select operators return one output element for each input element. The Select operator has the opportunity to project a new type of element, however, and its often useful to think of the Select operator as performing a transformation or mapping.
The SelectMany operator is useful when working with a sequence of sequences. This operator will “flatten” the sub-sequences into a single output sequence – you can think of SelectMany as something like a nested iterator or nested foreach loop that digs objects out of a sequence inside a sequence. SelectMany is used when there are multiple from keywords in a comprehension query.
Here is an example:
string[] famousQuotes = 
{
    "Advertising is legalized lying",
    "Advertising is the greatest art form of the twentieth
        century" 
};

var query = 
        (from sentence in famousQuotesfrom word in sentence.Split(' ')
         select word).Distinct();
If we iterate through the result we will see the query produce the following sequence of strings: “Advertising” “is” “legalized” “lying” “the” “greatest” “Art” “form” “of” “twentieth” “century”. The second from keyword will introduce a SelectMany operator into the query and a translation of the above code into extension method syntax will look like the following.
query =
    famousQuotes.SelectMany(s => s.Split(' '))
                .Distinct();
The above code will produce the same sequence of strings. If we used a Select operator instead ofSelectMany …
var query2 =
    famousQuotes.Select(s => s.Split(' '))
                .Distinct();
… then the result would be a sequence that contains two string arrays: { “Advertising”, “is”, “legalized”, “lying” } and { “Advertising”, “is”, “the”, “greatest”, “art”, “form”, “of”, “the” “twentieth”, “century” }. We started with an array of two strings and the Select operator projected one output object for each input. The SelectManyoperator, however, iterated inside of the string arrays produced by Split to give us back one sequence of many strings.

Partition Operators

The Skip and Take operators in LINQ are the primary partitioning operators. These operators are commonly used to produced paged result-sets for UI data binding. In order to get the third page of results for a UI that shows 10 records per page, you could apply a Skip(20) operator followed by a Take(10).
There are also SkipUntil and TakeUntil operators that accept a predicate that you can use to dynamically express how many items you need.
int[] numbers = { 1, 3, 5, 7, 9 };

// yields 5, 7, 9 
var query = numbers.SkipWhile(n => n < 5)
                   .TakeWhile(n => n < 10);

Join Operators

In LINQ there is a Join operator and also a GroupJoin operator. The Join operator is similar to a SQL INNER JOIN in the sense that it only outputs  results when it finds a match between two sequences and the result itself is still a flat sequence. You typically need to perform some projection into a new object to see information from both sides of the join.
var employees = new List<Employee> {
    new Employee { ID=1, Name="Scott", DepartmentID=1 },
    new Employee { ID=2, Name="Poonam", DepartmentID=1 },
    new Employee { ID=3, Name="Andy", DepartmentID=2}
};

var departments = new List<Department> {
    new Department { ID=1, Name="Engineering" },
    new Department { ID=2, Name="Sales" },
    new Department { ID=3, Name="Skunkworks" }
};

var query =
     from employee in employees
     join department in departments
       on employee.DepartmentID 
           equals department.ID
     select new { 
         EmployeeName = employee.Name, 
         DepartmentName = department.Name 
     };
Notice the Join operator has a keyword for use in comprehension query syntax, but you must use the equalskeyword and not the equality operator (==) to express the join condition. If you need to join on a “compound” key (multiple properties), you’ll need to create new anonymous types for the left and right hand side of the equals condition with all the properties you want to compare.
Here is the same query written using extension method syntax (no Select operator is needed since the Joinoperator is overloaded to accept a projector expression).
var query2 = employees.Join(
        departments,         // inner sequence 
        e => e.DepartmentID, // outer key selector 
        d => d.ID,           // inner key selector 
        (e, d) => new {      // result projector 
          EmployeeName = e.Name,
          DeparmentName = d.Name
        });
Note that neither of the last two queries will have the “Skunkworks” department appear in any output because the Skunkworks department never joins to any employees. This is why the Join operator is similar to the INNER JOIN of SQL. The Join operator is most useful when you are joining on a 1:1 basis.
To see Skunkworks appear in some output we can use the GroupJoin operator. GroupJoin is similar to a LEFT OUTER JOIN in SQL but produces a hierarchical result set, whereas SQL is still a flat result.
Specifically, the GroupJoin operator will always produce one output for each input from the “outer” sequence. Any matching elements from the inner sequence are grouped into a collection that can be associated with the outer element.
var query3 = departments.GroupJoin(
        employees,           // inner sequence d => 
        d.ID,                // outer key selector 
        e => e.DepartmentID, // inner key selector 
        (d, g) => new        // result projector {
            DepartmentName = d.Name,
            Employees = g
        });
In the above query the departments collection is our outer sequence – we will always see all the available departments. The difference is in the result projector. With a regular Join operator our projector sees two arguments – a employee object and it’s matching department object. With GroupJoin, the two parameters are: a Department object(the d in our result projector lambda expression) and an IEnumerable<Employee> object(the g in our result projector lambda expression). The second parameter represents all the employees that matched the join criteria for this department (and it could be empty, as in the case of Skunkworks). A GroupJoin can be triggered in comprehension query syntax by using a join followed by an into operator.

Grouping Operators

The grouping operators are the GroupBy and ToLookup operators.Both operators return a sequence ofIGrouping<K,V> objects. This interface specifies that the object exposes a Key property, and this Key property will represent the grouping value.
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var query = numbers.ToLookup(i => i % 2);

foreach (IGrouping<int, int> group in query)
{
    Console.WriteLine("Key:
        {0}", group.Key);
    foreach (int number in group)
    {
        Console.WriteLine(number);
    }
}
This above query should produced grouped Key values of 0 and 1 (the two possible answers to the modulo of 2). Inside of each group object will be an enumeration of the objects from the original sequence that fell into that grouping. Essentially the above query will have a group of odd numbers (Key value of 1) and a grouping of even numbers (Key value of 2).
The primary difference between GroupBy and ToLookup is that GroupBy is lazy and offers deferred execution. ToLookup is greedy and will execute immediately.

Generational Operators

The Empty operator will create an empty sequence for IEnumerable<T>. Note that we can use this operator using static method invocation syntax since we typically won’t have an existing sequence to work with.
var empty = Enumerable.Empty<Employee>();
The Range operator can generate a sequence of numbers, while Repeat  can generate a sequence of any value.
var empty = Enumerable.Empty<Employee>();            

int start = 1;
int count = 10;
IEnumerable<int> numbers = Enumerable.Range(start, count);

var tenTerminators = Enumerable.Repeat(new Employee { Name ="Arnold" }, 10);
Finally, the DefaultIfEmpty operator will generate an empty collection using the default value of a type when it is applied to an empty sequence.
string[] names = { }; //empty array 

IEnumerable<string>
        oneNullString = names.DefaultIfEmpty();

Equality

The one equality operator in LINQ is the SequenceEquals operator. This operator will walk through two sequences and compare the objects inside for equality. This is another operator where you can override the default equality test using an IEqualityComparer object as a parameter to a second overload of the operator. The test below will return false because the first employee objects in each sequence are not the same.
Employee e1 = 
        new Employee() { ID = 1 };
Employee e2 = new Employee() { ID = 2 };
Employee e3 = new Employee() { ID = 3 };

var employees1 = new List<Employee>() { e1, e2, e3 };
var employees2 = new List<Employee>() { e3, e2, e1 };

bool result = employees1.SequenceEqual(employees2);

Element Operators

Element operators include the ElementAt, First, Last, and Single operators. For each of these operators there is a corresponding “or default” operator that you can use in to avoid exceptions when an element does not exist (ElementAtOrDefault, FirstOrDefault, LastOrDefault, SingleOrDefault). Thier behavior is demonstrated in the following code.
string[] empty = { };
string[] notEmpty = { "Hello", "World" };

var result = empty.FirstOrDefault(); // null 
result = notEmpty.Last();            // World
result = notEmpty.ElementAt(1);      // World 
result = empty.First();              // InvalidOperationException 
result = notEmpty.Single();         // InvalidOperationException 
result = notEmpty.First(s => s.StartsWith("W")); 
The primary difference between First and Single is that the Single operator will throw an exception if a sequence does not contain a single element, whereas First is happy to take just the first element from a sequence of 10. You can use Single when you want to guarantee that a query returns just a single item and generate an exception if the query returns 0 or more than 1 item.

Conversions

The first two conversion operators are AsEnumerable and AsQueryable. The AsEnumerable operator is useful when you want to make a queryable sequence (a sequence backed by a LINQ provider and typically a remote datasource, like a database) into an in-memory collection where all operators appearing afterAsEnumerable will work in-memory with LINQ to Object. For example, when working with the queryable properties of a LINQ to SQL data context class, you can return a query to the upper layers of your application with AsEnumerable on the end, meaning the higher layers will not be able to compose operators into the query that change the SQL that LINQ to SQL will generate.
The AsQueryable operator works in the opposite direction – it makes an in-memory collection appear as if it is backed by a remote LINQ provider. This can be useful in unit tests where you want to “fake” a LINQ to SQL queryable collection with in-memory data.
The OfType and Cast operators both coerce types in a sequence. The OfType operator we also listed as a filtering operator – it will only return objects that can be type cast to a specific type, while Cast will fail if it cannot cast all the objects in a sequence to a specific type.
object[] data = { "Foo", 1, "Bar" };

// will return a sequence of 2 strings 
var query1 = data.OfType<string>();

// will create an exception when executed 
var query2 = data.Cast<string>();
The last four conversion operators are ToListToDictionaryToList, and ToLookup. These are all greedy operators that will execute a query immediately and construct an in-memory data structure. For more onToList and greediness, see Lazy LINQand Enumerable Objects.

Concatenation

The Concat operator concatenates two sequences and uses deferred execution. Concat is similar to theUnion operator, but it will not remove duplicates.
string[] firstNames = { "Scott", "James", "Allen", "Greg" };
string[] lastNames = { "James", "Allen", "Scott", "Smith" };

var concatNames = firstNames.Concat(lastNames).OrderBy(s => s);
var unionNames = firstNames.Union(lastNames).OrderBy(s => s);
The first query will produce the sequence: “Allen”, “Allen”, “Greg”, “James”, “James”, “Scott”, “Scott”, “Smith”.
The second query will produce the sequence: “Allen”, “Greg”, “James”, “Scott”, “Smith”.

Aggregation Operators

No query technology is complete without aggregation, and LINQ includes the usual AverageCount,LongCount (for big results), MaxMin, and Sum. For example, here are some aggregation operations in a query that produces statistics about the running  processes on a machine:
Process[] runningProcesses = Process.GetProcesses();

var summary = new {
    ProcessCount = runningProcesses.Count(),
    WorkerProcessCount = runningProcesses.Count(p => p.ProcessName == "w3wp"),
    TotalThreads = runningProcesses.Sum(p => p.Threads.Count),
    MinThreads = runningProcesses.Min(p => p.Threads.Count),
    MaxThreads = runningProcesses.Max(p => p.Threads.Count),
    AvgThreads = runningProcesses.Average(p => p.Threads.Count)
};
The most interesting operator in LINQ is the Aggregate operator, which can perform just about any type of aggregation you need (you could also implement mapping, filtering, grouping, and projection with aggregation if you wish). I’ll pass you to my blog entry “Custom Aggregations In LINQ” for more information.

Summary

I hope you’ve enjoyed this tour of the standard LINQ operators. Knowing about all the options available in the LINQ tool belt will empower you to write better LINQ queries with less code. Don’t forget you can also implement custom operators in LINQ if none of these built-in operators fit your solution – all you need to do is write extension methods for IEnumerable<T> or IQueryable<T> (although be careful extending IQueryable<T> as remote LINQ providers will not understand your custom operator)

No comments:

Post a Comment