VSLive! San Francisco Preview: LINQ to SQL Q&A with Perry Birch
There’s a lot that LINQ to SQL (formerly called Dlinq) can offer developers, and in this Q&A preview of his session, Birch outlines what’s new, shares some of his favorite LINQ to SQL features and more.
What do you think are the top benefits for using LINQ to SQL?BIRCH: There are two primary benefits that I enjoy when using LINQ to SQL, the first one being the ease of transition for people moving from standard T-SQL query syntax to this new method of data retrieval. Both LINQ to SQL and T-SQL have very similar syntax and command results. It is also easy to create the mapping files that are used.
The second benefit is that LINQ to SQL derives from LINQ, which means that the techniques and syntax used to query for data becomes very similar in nature to querying for objects, XML and any other LINQ-enabled data source.
What’s your favorite LINQ to SQL feature/tool?
B: It really depends on the type of project I’m working on. For small and prototype projects, I really love the domain-specific language designer and code generation found in SQL Metal, simply because it is a pretty trick. However, when I’m looking at building a bigger, tiered application, I tend to shy away from all the attribute markup that is engrained into the class definitions; then, my second favorite feature (and the real reason to use LINQ to SQL) is the T-SQL generation. For anyone who is a db mapping enthusiast like me and hasn’t yet looked into these, I highly recommend digging into the Binary Execution Trees which “LINQ to N” engines use to map to the various data stores.
When you first started using LINQ to SQL, was there anything that surprised you?
B: I would say the biggest “hmm…” moments for me were related to disconnecting and reconnecting objects to the Data Context. I typically try to wrap the data calls into one or more interface contract(s) and pipe the data to the caller using serialized data via services. The problem is in the deserialization and reconnection of the object to the Data Context for create and update operations. I’ll be using one workaround in my presentation at VSLive! 2008 at the end of March so I don’t want to give too much away…Seriously, though, there are a few valid approaches to getting around this issue depending on your requirements/code style that are pretty easy to find if you search for them.
For users of LINQ to SQL’s predecessor ObjectSpaces, what are the biggest differences that they should be aware of?
B: The best way to clarify the differences between Object Spaces and LINQ to SQL is by showing the code. For example, to add a given object to an object spaces mapper you would use syntax like the following:
public void addAnimal(int ID, string Species, string Age, string Weight)
{
Animal animal = null;
IObjectSpace objectSpace;
objectSpace = ObjectSpaceFactory.CreateObjectSpace(mapFile);
animal = (Animal) objectSpace.CreateObject(typeof(Animal), ID, Age,
Weight);
objectSpace.Update(animal);
}
There are a couple key elements to point out with this syntax. First, notice that the result type is not explicitly cast and requires a (Animal) boxing operation and a typeof(Animal) to specify the expected return type. Second, notice the list of parameters being passed into the create call.
An example of similar functionality using LINQ to SQL looks something like this:
// This is now just a helper method
public void addAnimal(string Species, string Age, string Weight)
{
Animal animal = new Animal();
animal.Species = Species;
animal.Age = Age;
animal.Weight = Weight;
// Call to the actual add method
addAnimal(animal);
}
public void addAnimal(Animal animal)
{
ZooDataContext dc = new ZooDataContext();
dc.Animals.Add(animal);
dc.SubmitChanges();
}
It looks like more code but the rest of the app would be using the Animal POCO (plain ol’ CLR objects), so most calls would now be using objects and one could simply use the new object instantiation method to create the object and forego the helper method entirely:
Animal animal = new Animal { Species = “Feline”, Age = 5, Weight = 100 };
addAnimal(animal);
Another and even more critical distinction comes when querying for objects. In ObjectSpaces the query filter has to be provided to the engine. This requires strict convention adherence and/or domain knowledge in order to create the queries. You may see something akin to the following:
string findFilter = “ID=”+ID;
animal = (Animal)objectSpace.GetObject(typeof(Animal), findFilter);
The developer writing this code has to know that the key field in the DB is named ID and any, more complicated filters require even more pre-knowlege of the database schema. In LINQ you would be using intellisense and type safe criteria to specify the filter:
var animal = dc.Animals.Single(a => a.ID == ID).SingleOrDefault();
Coming from a heavy SQL background and having created many cross-domain filtering methods, I really appreciate the flexibility, safety and simplicity of the new syntax.
Do you have a favorite LINQ to SQL tip you could share with our readers?
B: The only thing I can really say is to jump in and start playing with this stuff. If you want a quick “up to speed” kind of experience, a good friend of mine, Daniel Egan, recently recorded a presentation where he goes into some pretty good depth on all the stuff behind LINQ to SQL and even shows some good intro code off. The link to that presentation can be found here.
What, if any, downsides do you feel LINQ to SQL has?
B: Right now, the two biggest downsides to LINQ to SQL is the lack of support for complex type mapping which, in most cases, precludes it from being very useful for legacy databases. Also, the current implementation of LINQ to SQL is, as the name implies, limited to mapping to a SQL Server Database, which means that all of the Oracle guys out there are missing out. The upside is that the LINQ semantics allow the creation of LINQ to Oracle.
Besides LINQ to SQL, what else in LINQ would you encourage developers to learn more about right away?
B: The first step is to jump in and start playing with the new language features. Besides saving a lot of typing, when used properly, they can help you to more clearly express your code’s intent.
If you haven’t done so already, start using more delegates and play around with the anonymous methods. Once you have those two concepts down, you can start refactoring your code down to lambda expressions, then things start getting really exciting (and confusing for awhile; don’t get intimidated).
A habit I picked up when writing lambdas is thinking to myself as I write out the expression. For example:
var myList = new List(); <– anonymous type!
… Populate the list with a bunch of values …
var average = myList.Average(c => c.IntegerProperty);
While writing the “c => c.IntegerProperty” I’m thinking something along the lines of:
“c is a ComplexObject and I want the IntegerProperty”
or
“c such that I get the average of all ComplexType’s IntegerProperty”
C#’s Query Syntax for LINQ
Take advantage of the new keywords associated with C# 3.0’s query syntax. Learn how these keywords map to methods defined using the query operands, and how you can define your own custom implementation for the query keywords.
Technology Toolbox: C# C# 3.0 includes many strong new features, but one of the most interesting is the inclusion of its new query keywords, that you use to create Query Expressions. Query Expressions are most commonly associated with Language Integrated Query (LINQ). They’re the core syntax that you’ll use when you create LINQ queries, whether you use LINQ to Objects, LINQ to SQL, LINQ to XML, or some other custom LINQ provider. I’ll discuss the query keywords, how those keywords map to methods defined using the query operands, and how you can define your own custom implementation for the query keywords.
One of the design goals of the C# language (and most computer programming languages) is to keep the set of keywords small. Fewer keywords means there’s more vocabulary available for your types, members, and other user-defined identifiers. It also simplifies the job of understanding the language. The C# team worked to minimize the necessary grammar for LINQ and the associated query syntax. They ended up adding eight contextual keywords to the C# language specifically for query: from, where, select, group, into, orderby, join, and let.The C# team also saw type inference as a major goal of the new release. The C# language is still strongly typed: every identifier has a known type, at least to the compiler. However, a developer must declare the type explicitly only when the compiler cannot make its own determination of the variable’s type. In many cases, the compiler determines the proper type for you.
Every query expression begins with the “from” keyword. From introduces a query expression by declaring the datasource and range variable. This snippet defines a simple query:
int[] someNumbers = { 2, 3, 7, 11, 13, 17, 19 };
var query = from number in someNumbers
someNumbers is the datasource, and number is the range variable. The datasource must implement IEnumerable<T>, or IEnumerable. In this example, someNumbers implements IEnumerable<int>. The compiler gives the range variable a type to match the type enumerated by the datasource. In the example above, number will be an integer. In general, the range variable is of type T, whenever the datasource implements IEnumerable<T>. In the case where the range variable implements IEnumerable, the range variable is a System.Object. You never need to declare the type of the range variable; the compiler infers it.
Do More with “From”
The “from” clause can do much more than iterate a single sequence. Often, an outer sequence will contain an inner sequence that contains the data you’re interested in. You can nest from clauses to create the sequence of elements from the inner sequences (see Listing 1). Using multiple from clauses when the second datasource isn’t a member of the first range variable will do a cross-join. This simple expression produces 25 pairs of values, ranging from {1,2} through {9,2} on to {1,10}, and, finally, through {9,10}:
int[] evenNumbers = { 2, 4, 6, 8, 10 };
int[] odddNumbers = { 1, 3, 5, 7, 9 };
var pairs = from even in evenNumbers
from odd in odddNumbers
select new { even, odd };
foreach (var pair in pairs)
Console.WriteLine(
" odd = {0}, even = {1}",
pair.odd, pair.even);
This from clause generates every permutation of a value from the first sequence value, and then pairs it with a value from the second sequence. Of course, you’re not limited to two sources. You could do an N-way cross-join for any number N. Just remember that the size of the result set will grow geometrically as you add more source sequences.
The next new keyword is select. Select determines what type of object or objects are returned by your query. Select is powerful enough to create whatever types you want, including anonymous types, as the output of a query. Select projects the values you want into the output of a query. You’ve already seen two examples of the select statement. Listing 1 showed a simple select that returns the same type as the input query. The immediately preceding code snippet illustrating the cross-join shows an example of a select that transforms the input type into another type. In that case, it creates a new type containing a number from the first sequence and a number from the second sequence. The compiler does quite a bit of work for you in that example. The compiler infers the shape of the items in the output sequence, and it creates an anonymous type matching that description. The output sequence contains values of that anonymous type. It sounds more difficult than it really is. In that example, the compiler creates a type that contains two integers: odd and even. Those integers are accessed through public read-only properties. The compiler writes a type similar to the type you would have created by hand.
In general, you can use the select keyword to transform the values in the query into any type you decide you want for the output from the query. You’ll see this feature at work in several of this article’s examples. For instance, you’ll see that you can compute values as part of the output, combine multiple inputs into a single output stream, generate new types, or almost anything else. Select is quite powerful because its argument is an arbitrary lambda expression that can return anything computed from the input sequence.
Filter the Input Sequence
Another keyword you’ll use frequently is where. Where filters the input sequence and passes only those values that match the condition specified by the where predicate. You can place where clauses almost anywhere in your query expression. The only restrictions are that it must not be first, nor can it be last. For example, assume you want to filter the input sequence so it returns only those values greater than five:
var query = from number in someNumbers where number > 5 select number;
Where clauses can also be more complicated. For example, you can combine multiple conditions into a single where clause:
var query = from number in someNumbers where number > 5 && number * number < 125 select number;
An alternative approach is to express the same construct with multiple where clauses:
var query = from number in someNumbers where number > 5 where number * number < 125 select number;
Where works the same way even when there are multiple sources, and you want to provide filters on those elements. All these query expressions produce the same result:
var pairs1 = from even in evenNumbers
where even > 4
from odd in odddNumbers
where odd < 5
select new { even, odd };
var pairs2 = from even in evenNumbers
from odd in odddNumbers
where even > 4
where odd < 5
select new { even, odd };
var pairs2 = from even in evenNumbers
from odd in odddNumbers
where even > 4 && odd < 5
select new { even, odd };
In general, you should consider placing where clauses as early as possible in your queries. Later clauses in your query operate on a smaller set of data.
Another new query keyword is orderby. Orderby creates an ordering of the output sequence. You can order the output elements on any expression that’s part of the input sequence, even if that element isn’t part of the output sequence, or by an element that’s computed from part of the input sequence.
This query sorts the list of names by the length of the name and then by alphabetical order:
var folks = from person in myPeeps
orderby person.Last.Length + person.First.Length
descending,
person.Last,
person.First
select string.Format("{0}, {1}",
person.Last, person.First);
The first orderby arranges the sequence of elements from longest to shortest. In the case of ties, the last name will be used to order the people. If there are still any ties, the first name will be used. Note that I’ve also transformed the person object into a string in the select statement.
As I mentioned earlier, the select statement can create any output type you wish. In most cases, I like to put the orderby clauses as late in the query expression as possible. Orderby needs the full sequence to do its work, so it creates a natural bottleneck on the sequence processing.
Subdivide the Results
Occasionally, you’ll use the group clause to subdivide the results of the query into logical groupings. You could modify the previous query by grouping the names by the length of the first name:
var folks = from person in myPeeps
group person by person.First.Length into names
orderby names.Key
select names;
There are a number of new ideas here, so let’s look at them closely. This group expression creates subdivisions based on the length of the first name of each person. It’s common to want to perform further processing on each group. Therefore, you’ll often see an “into” following the group clause. The into keyword assigns a name to each grouping. In the above example, that grouping is called “names.” Next, you see the familiar orderby clause to order the groups based on the key assigned to each group. The Key is the value of the “by” portion of the group clause. In this example, the Key is an integer, which stores the length of the first names in the group.
The result sequence of the query uses a different type when you add the group clause. The results for a group clause are stored in an IEnumerable<IGrouping<TKey, TElement>> sequence. That’s a sequence of groups. For each element, the Key element stores the key, and the IGrouping implements IEnumerable<TElement> so that you can iterate the values in each group.
You use the join clause to perform equijoins. An equijoin creates a single sequence from two different input sequences based on an equality test of some property in both sequences. There are three different variations of join. Join is most easily illustrated in the context of LINQ to SQL, so these examples are taken from the Sample Queries sample application that ships with Visual Studio 2008. They perform joins on different tables in the Northwind database.
This query performs an inner join between the categories and products tables to produce a single sequence with the products and categories listed:
var q =
from c in db.Categories
join p in db.Products on c.ID equals
p.CategoryID
select new
{productName = p.ProductName,
categoryName = c.Name};
This code snippet creates a single sequence that contains the product name and category name. If a category doesn’t have any matching products, that category doesn’t appear in the output.
If you use the into keyword with your join clause, you create a group join. Similar to the groupby clause, a group join creates a list of lists for each group. In that same vein, you could create a list of all categories where each element contains the list of all products in that category:
var q =
from c in db.Categories
join p in db.Products on c.ID equals
p.CategoryID into productGroup
select new
{categoryName = c.Name,
Products =productGroup};
The only noticeable difference is the addition of the into keyword to create a set of groups. However, when you use the group join, any of the elements on the left side source won’t appear in the output sequence unless there are matching elements in the right input sequence. In the preceding example, that means any categories with no products won’t appear in the output sequence. To fix that, you need the left outer join. That requires using the DefaultIfEmpty method. This variation returns all categories, even those with no products:
var q =
from c in db.Categories
join p in db.Products on c.ID equals
p.CategoryID into productGroup
from aproduct in
productGroup.DefaultIfEmpty()
select new
{categoryName = c.Name,
Product =aproduct};
The left outer join produces a single, flat sequence, like the inner join. The element will contain the category and the product. In the approach used here, the product will be null if there are no products associated with a given category.
Only one keyword remains–let. Let creates a new range variable in the query expression. You can use this for a couple different purposes. First, if the new range variable is an enumerable type, it can be queried. Alternatively, it can store the result of some computation to avoid recomputing it.
Let’s look at this keyword in action.
Suppose you want to modify the pair of numbers sample to keep only those pairs of numbers where the distance from the origin is less than 10. You could modify the query this way:
var pairs = from even in evenNumbers
from odd in odddNumbers
where Math.Sqrt(
even * even + odd * odd) < 10
select new { even, odd };
That’s fine, but now assume you also want to include the distance from the origin in the output sequence. You could write it this way, but you’d be computing the distance twice for every value:
var pairs = from even in evenNumbers
from odd in odddNumbers
where Math.Sqrt(even * even + odd * odd) < 10
select new { even, odd,
distance = Math.Sqrt(even * even + odd * odd)};
This is a great place to use the let clause. You can rewrite the clause this way so that you have to compute the distance only once:
var pairs = from even in evenNumbers
from odd in odddNumbers
let distance = Math.Sqrt(even * even + odd * odd)
where distance < 10
select new { even, odd, distance };
The second use of let is less common. If your interim calculation produces a sequence, you could use that range variable in a from clause and process each element in the sequence.
Map Keywords to Methods
That’s all the new queries that support query expressions. Most of these new keywords compile to method calls. Where compiles to a call to the Where() method. Select compiles to a call to the Select() method. The first orderby compiles to a call to the OrderBy() method. Subsequent calls compile to ThenBy(). If the orderby clauses include the descending keyword, the methods called are OrderByDescending() and ThenByDescending(). Group clauses translate into the GroupBy() method calls. Join usually compiles to a call to Join(); if it’s followed by “into,” it compiles to a GroupJoin() method.
The .NET 3.5 base class library contains versions of each of these methods in two different classes: System.Linq.Queryable and System.Linq.Enumerable. The versions in Enumerable are defined so that the standard query operators can be used in any sequence. The Queryable version is used when the source supports IQueryable.
This might not sound so important, but there’s a good reason for you to understand that these keywords map to specific methods: They’re methods, so you can define your own version for your own types if you can create a better version than the default. The default versions are defined as extension methods, which means that any version you write will probably be a better match.
The standard query operators are general methods. Each of them takes expression trees, so you’re signing up for quite a bit of work when you define your own versions of these. You’ll more likely define your own versions of other extension methods defined in Queryable or Enumerable. Obvious examples are Take and Skip. You might use Skip to implement any container implementing IList<T>:
public static IEnumerable<T>
Skip<T>(IList<T> sequence,
int count)
{
for (int i = count;
i < sequence.Count; i++)
yield return sequence[i];
}
Note that you can avoid any number of MoveNext calls by using the indexing operator.
Similarly, you could add your own versions of any of the methods defined in System.Linq.Enumerable, where you had a more efficient way of computing the answer.
Earlier in this article, I mentioned that these new keywords are contextual keywords. That’s important for backward compatibility with existing programs. These new keywords are only reserved words when they appear in the proper location. These variable declarations are all legal C# 3.0:
int where = 5; string select = "This is a string"; int var = 7;
That’s because the keywords where, select, and var aren’t appearing in the proper context for the compiler to view them in their special new way.
This might also seem like language geek trivia, but it’s important for most developers. The C# team worked hard to ensure that these new additional features wouldn’t break existing programs. By making these new keywords contextual keywords, they were able to introduce new syntax, while minimizing the chance of breaking any existing C# code. None of this new syntax will break your existing code.