Dr. Dobb's | LINQ-to-SQL and T-SQL

LINQ-to-SQL and T-SQL

LINQ-to-SQL, an API for working with data in a SQL Server database, fully supports the LINQ syntax for running queries.

November 06, 2008
URL:http://www.drdobbs.com/database/linq-to-sql-and-t-sql/212001058

Dino is a freelance programmer specializing in Windows. He can be contacted at [email protected].

Try asking about a definition for LINQ-to-SQL. Is it an API for building .NET-based data access layers? Is it a poor man's Object/Relational Mapper (O/RM) tool? How does it relate to LINQ and querying? Is it heralding a new trend that may end up replacing T-SQL entirely?

In the end, LINQ-to-SQL can be correctly seen both as a data access layer technology and as a simple—but effective—O/RM tool. Confusion about the role of LINQ-to-SQL arises from the fact that LINQ-to-SQL is much richer than a plain database API (like ADO.NET), but not as functionally rich as a regular O/RM tool should be.

First It Was Just LINQ...

LINQ, short for "Language Integrated Query," is only about queries. However, queries require collections of data to search on. In .NET 3.5, LINQ provides a query object model through which you can formulate queries dynamically, then run them against a collection of data. In addition, C# and Visual Basic.NET have been extended with context keywords that map directly to some elements of the underlying query object model. But LINQ-related features in C# and Visual Basic.NET are merely syntactic sugar—the juice is all in the classes in the .NET Framework.

The LINQ query object model is designed to run queries against enumerable collections of data, or against any data source that looks like an enumerable collection of data. An array is an in-memory collection of data. In .NET, the array class implements IEnumerable and, as such, is a natively enumerable type. For instance, an XML document is a disk file. Once loaded in memory, it can be exposed as an enumerable collection of nodes. Some extra work is required to parse the XML document into a collection of nodes and map the query object model onto the collection of nodes. To enable LINQ to query on XML data, an extra object model is required—LINQ-to-XML. This XML-specific object model primarily supplies a contracted interface—IQueryable—to plug into the LINQ query object model. Additionally, it provides facilities to let you edit the content of XML documents. In the end, LINQ-to-XML is an API for working with XML documents that also fully support LINQ for queries. LINQ-to-SQL follows the same model.

LINQ-to-SQL is an API for working with data in a SQL Server database and fully supports the LINQ syntax for running queries. The LINQ-to-SQL API consists of an object model that represents a subset of the tables in the specified database. In LINQ-to-SQL, you provide a connection string and Visual Studio 2008 automatically builds the object model for you for all the specified tables. For example, if you pick up the Customers table, then the Visual Studio 2008 wizard provides a Customer class whose properties match the columns on the table. The final result is a table-centric object model. The LINQ-to-SQL object model supports the IQueryable interface, which makes it pluggable into the LINQ query model. Consequently, a LINQ-based query that targets a LINQ-to-SQL collection of data is transformed into a T-SQL query and executed. Results are then loaded into in-memory object collections and returned to the application. The LINQ-to-SQL object model also features facilities to write data to underlying tables—which makes it a tool for creating a data access layer. At the same time, the ability of transparently persisting objects to relational tables (as well as some inherent capabilities of the object model) makes LINQ-to-SQL look a lot like an O/RM tool.

Who's LINQ-to-SQL For?

Is LINQ-to-SQL for .NET developers doing database stuff, or SQL Server experts who spend their time writing, maintaining, and fine-tuning stored procedures and indexes? Is LINQ-to-SQL a tool that diminishes the role of handcrafted T-SQL code? Are T-SQL skills a thing of the past? Should we just take the T-SQL code that LINQ-to-SQL generates under-the-hood without any possibility of intervention?

LINQ-to-SQL is not here to kill T-SQL and related skills; LINQ-to-SQL is here to help. But it is a tool that heralds upcoming radical changes in the development of multitier systems. The more we move towards an object-oriented design of the business logic, the more we need powerful tools to smooth the subsequent object/relational impedance mismatch. LINQ-to-SQL may not be the perfect tool for the purpose, but its introduction should be seen in this perspective.

The real question is whether you're okay with a table-based representation of data. Are you fine working with recordset-like data structures? Is it okay for you to implement CRUD and query services with a mix of stored procedures and ADO.NET-based business components? If the answer is "yes," then you probably don't need LINQ-to-SQL.

But if your answer to the previous question is "No, I would really like to employ objects to represent my problem's domain," then LINQ-to-SQL is the first step in a new direction. LINQ-to-SQL uses T-SQL under-the-hood, but leaves room for customization. As a SQL Server expert, you should be looking at LINQ-to-SQL internals to see where you can intervene to improve things. LINQ-to-SQL may become your best friend and a higher level tool to work with.

The Object Model

LINQ-to-SQL is all about its object model. Through the object model you define the model of data you'll work with. The object model contains classes that may contain both data and behavior—persistence, logic, and validation. The Visual Studio 2008 wizard is the easiest starting point for building such an object model. However, it is ultimately a collection of classes that can be written manually. Suppose you intend to connect to the standard Northwind database and work with Customers and Orders tables. You end up with a Customer class as in Listing One.

The LINQ-to-SQL object model is essentially a table-centric object model with a varying level of abstraction. In the default source code for the various classes, the abstraction over the physical structure of the tables is null. As you can see in Listing One, Customer is a faithful representation of a record in the Customers table. However, the class is decorated with a few attributes and is marked as partial.

The class is a plain class that just exposes a few properties and implements a few interfaces. More precisely, it is a plain class that operates as a table record class. Properties and the class itself are mapped to columns of a particular table through attributes. Attributes contain metadata that bind a particular property to a particular column and the whole class to a table record. The attribute contains information about name and type of the column, plus additional information such as primary key, nullability, and constraints:

 
[Column(Storage="_CustomerID", DbType="NChar(5) NOT NULL", 
        CanBeNull=false, IsPrimaryKey=true)]

The Column attribute dictates that the property has to be serialized back to the database table when an instance of the class is made persistent. Likewise, the property has to be initialized with persistent data when an instance of the class is built from a data query.

[Table(Name="dbo.Customers")]
public partial class Customer : INotifyPropertyChanging, INotifyPropertyChanged
{
   [Column(Storage="_CustomerID", DbType="NChar(5) NOT NULL", 
           CanBeNull=false, IsPrimaryKey=true)]
   public string CustomerID
   {
      get { return this._CustomerID; }
      set
      {
         if ((this._CustomerID != value))
         {
            this.OnCustomerIDChanging(value);
            this.SendPropertyChanging();
            this._CustomerID = value;
            this.SendPropertyChanged("CustomerID");
            this.OnCustomerIDChanged();
         }
      }
   }
   :
   [Association(Name="Customer_Order", Storage="_Orders", OtherKey="CustomerID")]
   public EntitySet<Order> Orders
   {
      get { return this._Orders; }
      set { this._Orders.Assign(value); }
   }
}

Listing One

Any class in the LINQ-to-SQL object model doesn't show off any Save or Load methods for I/O operations. To access the results of a query, you just write a LINQ query:


NorthwindDataContext context = new NorthwindDataContext();
var data = from customer in context.Customers
       where customer.Country == "USA"
            select customer;

The data context object is the entry point in the LINQ-to-SQL engine. The class is automatically generated by the Visual Studio 2008 wizard as a specialization of a system-provided DataContext class. The data context is the means through which developers can run queries and updates against the database.

Apparently, the preceding query targets an in-memory object model and has no relationship to the T-SQL language and SQL Server. Some T-SQL code, however, is generated under-the-hood and executed against the database connection associated with the data context. The aforementioned LINQ code evaluates to the following T-SQL:


SELECT * FROM customers WHERE country="USA"

Likewise, a more sophisticated T-SQL code is run when some program goes through the following LINQ code:


var data = from customer in context.Customers
     where customer.Orders.Count > 10
     select customer;

There's always a lot of T-SQL code behind any LINQ query that targets a LINQ-to-SQL collection. Here's the T-SQL code for the preceding query:


SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],    [t0].[ContactTitle], 
       [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode],  
  [t0].[Country], 
       [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE ((
    SELECT COUNT(*)
    FROM [dbo].[Orders] AS [t1]
    WHERE [t1].[CustomerID] = [t0].[CustomerID]
    )) > @p0

The LINQ-to-SQL engine buried in the folds of the DataContext object, translates the LINQ query into a T-SQL statement. However, the statement is not immediately run. It executes only when the result set is accessed for reading, enumerated, or simply copied into a memory collection. The effective T-SQL code can be tracked either using the SQL Profiler tool or the Visual Studio 2008 debugger; see Figure 1.

[Click image to view at full size]

Figure 1: The T-SQL code being run as detected by the Visual Studio 2008 debugger.

Writing queries using the LINQ syntax is much easier and faster for every developer. Moreover, the approach behind LINQ-to-SQL takes engineers towards an object-oriented vision of the data that makes it inherently simpler to manage complex scenarios.

The primary reason that stopped engineers from widely adopting object-oriented methodologies to design the data model of their applications has been the lack of effective (and especially quick and reliable) mechanisms to make the object model persistent down to a relational database. Object-oriented database-management systems, for a variety of reasons, have yet to capture the hearts of architects and project managers. As a result, an object/relational mismatch has been before our eyes for years. LINQ-to-SQL attempts to provide a simplified solution that, compared to a full O/RM tool, is lightweight, out-of-the-box, fully integrated in Visual Studio 2008, and limited to SQL Server.

Through the DataContext class, the LINQ-to-SQL object model also supports a number of interesting features typical of a full-fledged O/RM tool—including lazy loading, identity and transaction management, and optimistic concurrency. And, of course, the LINQ-to-SQL object model fully supports updates. Updates are not conducted through a direct database operation, but also pass through the DataContext class. You enter changes to the object model and when you're done, you tell the data context to make changes persistent to the data storage, as in Listing Two.

NorthwindDataContext db = new NorthwindDataContext();
Product p1 = db.Products.Single(p => p.ProductName == "X");
Product p2 = db.Products.Single(p => p.ProductName == "Y");

Order order1 = new Order();
order1.OrderDate = DateTime.Now;
order1.RequiredDate = order1.OrderDate.AddDays(7);

OrderDetail item1 = new OrderDetail();
item1.ProductID = p1.ProductID;
item1.Quantity = 4;
order1.OrderDetails.Add(item1);

OrderDetail item2 = new OrderDetail();
item2.ProductID = p2.ProductID;
item2.Quantity = 1;
order1.OrderDetails.Add(item2); 

Customer cust = db.Customers.Single(c => c.CustomerID == "ALFKI");
cust.Orders.Add(order1);

db.SubmitChanges();

Listing Two

LINQ-to-SQL and SQL Server Development

LINQ-to-SQL has the power to autonomously generate good T-SQL statements. Does this mean that LINQ-to-SQL can ultimately replace SQL Server experts in any organization? What about the quality of the T-SQL code being generated? Is there any room left for manual improvement and fine-tuning?

LINQ-to-SQL and O/RM tools in general are not designed to be in contraposition to SQL Server experts. They are just tools in the hands of database experts. However, an O/RM is a high-level tool that other professionals can use as effectively, even when they lack specific SQL Server advanced skills.

LINQ-to-SQL is designed to produce T-SQL code that is the best possible in the widest possible number of situations. And to get that code written down you don't have to exercise specific SQL Server skills. This doesn't mean that LINQ-to-SQL will always employ the perfect T-SQL code for any situation. But it will in a surprisingly large number of situations. The role of a SQL Server expert is just detecting whether there's room for improvement and introducing it.

Fine-Tuning LINQ-to-SQL Code

Any class in the LINQ-to-SQL object model is partial, meaning that its source code is not contained in a single class but it may span over two or more distinct source files. By leveraging the partial class (and partial method) mechanism, you can modify the behavior of LINQ-to-SQL and bend it to your wishes.

There are two main mechanisms of extensibility built into LINQ-to-SQL classes that relate to fine-tuning T-SQL code:

Partial methods in entity classes.
Partial methods in the data context class.

A partial method in a partial class is an optional method that is declared without an implementation. The following code shows the partial methods defined for the data context class:


partial void OnCreated();
partial void InsertCustomer(Customer instance);
partial void UpdateCustomer(Customer instance);
partial void DeleteCustomer(Customer instance);
partial void InsertOrder(Order instance);
partial void UpdateOrder(Order instance);
partial void DeleteOrder(Order instance);

A call to a partial method may exist everywhere within a partial class. At compile time, if a body for any partial methods is not found, the compiler replaces any call to a partial method with a no-op. In other words, partial methods are injection points in the body of a class for pieces of code that you want to customize. For example, the partial methods listed above let you change the way in which a customer (or an order) is inserted, updated, or deleted. If you implement any of those methods, you can specify your own T-SQL code for update operations. Here's an example:


partial void DeleteCustomer(Customer instance)
{
   this.ExecuteCommand("DELETE FROM customers WHERE customerid=@id",
            instance.CustomerID); 
}

Needless to say, you can use stored procedures here as well.

Another bunch of extensibility methods exist to let you validate changes entered into entity classes such as Customer and Order.


partial void OnLoaded();
partial void OnValidate(System.Data.Linq.ChangeAction action);
partial void OnCreated();
partial void OnCustomerIDChanging(string value);
partial void OnCustomerIDChanged();

For each property, you find a pair of OnXxxChanging and OnXxxChanged methods to let you control what happens before and after the assignment of a value to a property.

Finally, what if you need to run a complex query and find out that the LINQ-to-SQL autogenerated T-SQL is not the best possible? How can you replace a query? You just run a plain T-SQL query using ADO.NET commands and objects. LINQ and LINQ-to-SQL are useful and helpful, but they're just high-level tools built on top of ADO.NET.

Conclusion

LINQ-to-SQL can definitely let you do your database stuff without knowing much about T-SQL and SQL Server. For example, LINQ-to-SQL is an excellent shortcut for, say, developers with an Oracle background. At the same time, LINQ-to-SQL is a technology open enough to let SQL Server experts express their skills and ability and do things the way they want. LINQ is the query language and LINQ-to-SQL is an auto-generated object model based on a SQL Server table that can be queried using LINQ. As a whole, LINQ-to-SQL looks like a simple and SQL Server-specific O/RM tool capable of offering CRUD and query services with support for transactions and concurrency. To answer the initial question—are LINQ-to-SQL and T-SQL friends or enemies? They're not certainly enemies. So are they friends? LINQ-to-SQL certainly has an intimate knowledge of T-SQL, but the T-SQL it produces may not be optimal in all situations. The number of situations where this happens is really few. You should use LINQ-to-SQL extensively before you can safely conclude that it doesn't work for you.

The Entity Framework and .NET 3.5 SP1

With the .NET Framework 3.5 SP1, Microsoft officially introduced Version 1.0 of ADO.NET Entity Framework (EF) with a similar tool—LINQ-to-Entities. Both LINQ-to-SQL and Entity Framework are O/RM tools, though with different flavors and scopes. The Entity Framework is Microsoft's fully fledged O/RM tool that, in terms of features, may compete with commercial and open-source products such as NHibernate and LLBLGen Pro. LINQ-to-Entities is just the query language that maps an Entity Framework entity model to a LINQ syntax. EF is the big brother of LINQ-to-SQL. If you find that EF works for you, you might just want to opt for that and skip over LINQ-to-SQL. Even in this case, though, the considerations we made here about T-SQL remain largely valid.

—D.E.