Dr. Dobb's | Comparing LINQ-to-XML with XPath

Comparing LINQ-to-XML with XPath

August 07, 2008
URL:http://www.drdobbs.com/web-development/comparing-linq-to-xml-with-xpath/209904294

Paul Kimmel is an application architect and author of more than a dozen books, including LINQ Unleashed for C#, on which this article is based. Courtesy Pearson Education, InformIT. All rights reserved.

A few years ago, I wrote a Blackjack game. The game uses statistics from a book on expert play and coaches the player in the statistically best play based on the player's and dealer's hands. (The game and source is available from my website at www.softconcepts.com.) The game was good enough that a programmer from Harrah's in Biloxi asked to use the source, and my understanding is that it is a pillow favor provided at the casino. In this article, game statistics from a round of play were saved as an XML file and that file is used for the demos. (You can download the game and save your own statistics or use the XML provided in Listing One.)

<?xml version="1.0" encoding="utf-8"?>
<Blackjack>
  <Player Name="Player 1">
    <Statistics>
       <AverageAmountLost>-28.125</AverageAmountLost>      <AverageAmountWon>30.681818181818183</AverageAmountWon>
       <Blackjacks>1</Blackjacks>
       <Losses>8</Losses>44
       <NetAverageWinLoss>5.9210526315789478</NetAverageWinLoss>
       <NetWinLoss>112.5</NetWinLoss>
       <PercentageOfBlackJacks>0.041666666666666664</PercentageOfBlackJacks>
       <PercentageOfLosses>33.333333333333329</PercentageOfLosses>
       <PercentageOfPushes>16.666666666666664</PercentageOfPushes>
       <PercentageOfWins>45.833333333333329</PercentageOfWins>
       <Pushes>4</Pushes>
       <Surrenders>1</Surrenders>
       <TotalAmountLost>-225</TotalAmountLost>
       <TotalAmountWon>337.5</TotalAmountWon>
     <Wins>11</Wins>
    </Statistics>
  </Player>
</Blackjack>

Listing One: Statistics from a Round of Play in the Blackjack Game Saved to an XML File

Examples of how to use the cards.dll are all over the web, including in some of my articles, such as Programming for Fun and Profit -- Using the Card.dll.

The basic flow of the subsections that follow is that you are shown some code that uses LINQ-to-XML to query nodes followed by an equivalent XPath query that accomplishes the same goal. (You don't need both; in practice, use one or the other.)

Using Namespaces

XML documents support namespaces. For example, if you add the following namespace to the XML file in Listing One (right after the <xml> tag -- both are shown), you need to include the namespace in both your LINQ-to-XML and your XPath queries.

<?xml version="1.0" encoding="utf-8"?>
<jack:Blackjack xmlns:jack="http://www.blackjack.com">

The XML in Listing Two shows the proper placement of the namespace jack added to the XML from Listing One. The code in Listing Three incorporates the namespace in the LINQ-to-XML to obtain the net amount won (or lost) from the XML file in Listing One. The second half of the listing uses the XPathSelectElement method and an XPath query to obtain the same value.

<?xml version="1.0" encoding="utf-8"?>
<jack:Blackjack xmlns:jack="http://www.blackjack.com">
  <jack:Player Name="Player 1">
    <jack:Statistics>
      <jack:AverageAmountLost>-28.125</jack:AverageAmountLost>
      <jack:AverageAmountWon>30.681818181818183</jack:AverageAmountWon>
      <jack:Blackjacks>1</jack:Blackjacks>
      <jack:Losses>8</jack:Losses>44
      <jack:NetAverageWinLoss>5.9210526315789478</jack:NetAverageWinLoss>
      <jack:NetWinLoss>112.5</jack:NetWinLoss>
      <jack:PercentageOfBlackJacks>0.041666666666666664</jack:
PercentageOfBlackJacks>
      <jack:PercentageOfLosses>33.333333333333329</jack:PercentageOfLosses>
      <jack:PercentageOfPushes>16.666666666666664</jack:PercentageOfPushes>
      <jack:PercentageOfWins>45.833333333333329</jack:PercentageOfWins>
      <jack:Pushes>4</jack:Pushes>
      <jack:Surrenders>1</jack:Surrenders>
      <jack:TotalAmountLost>-225</jack:TotalAmountLost>
      <jack:TotalAmountWon>337.5</jack:TotalAmountWon>
      <jack:Wins>11</jack:Wins>
    </jack:Statistics>
  </jack:Player>
</jack:Blackjack>

Listing Two: The XML from Listing One with the Namespace jack Added

using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
  private static void UseNamespace()
  {
    const string filename = "..\\..\\CurrentStatsWithNamespace.xml";
    XDocument doc = XDocument.Load(filename);
    XNamespace jack = "http://www.blackjack.com";

    XElement winLoss1 = doc.Element(jack + "Blackjack")
      .Element(jack + "Player").Element (
      jack + "Statistics").Element(jack + "NetWinLoss");
    Console.WriteLine(winLoss1);
    Console.ReadLine();
    XmlReader reader = XmlReader.Create(filename);
    XElement root = XElement.Load(reader);
    XmlNameTable table = reader.NameTable;
    XmlNamespaceManager manager = new XmlNamespaceManager(table);
manager.AddNamespace("jack", "http://www.blackjack.com");
    XElement winLoss2 =
      doc.XPathSelectElement(
      "./jack:Blackjack/jack:Player/jack:Statistics/jack:NetWinLoss",
manager);
    Console.WriteLine(winLoss2);
    Console.ReadLine();
}

Listing Three: The Main Function Uses LINQ to XML and a Namespace to Obtain a Value, and an Equivalent XPath Query to Obtain the Same Value

In the example, an XmlReader was created from the XML file. The root XElement was obtained from the reader, followed by the Nametable. The Nametable is an instance of the System.Xml.Nametable class, and it contains the atomized names of the elements and attributes of the XML document. If a name appears multiple times in an XML document, it is stored only once in a Nametable, as a Common Language Runtime (CLR) object. Such storage permits object comparisons on these elements and attributes rather than a much more expensive string comparison. (This is managed for you.)

Next, the table is used to create an XmlNamespaceManager and the desired XML namespace string is added to the manager. Finally, the XmlNamespaceManager is passed as an argument to the XPathSelectElement method. The XPath query is "./jack:Blackjack/jack:Player/jack:Statistics/jack:NetWinLoss". The subpath "jack:" demonstrates how to incorporate the namespace in the XPath query.

Our examples use the XPath support provided by LINQ-to-XML in the System.Xml.Linq namespace. XPath support is provided in System.Xml.XPath too, and you would use different classes and behaviors if you were to use that approach. As an exercise, if you are interested, you can experiment by implementing the equivalent behaviors using the capabilities of the XPath namespace.

Finding Children

Another thing you might want to do is find the value of child elements. To trim up the code for this example, you can use the XML in Listing One without the namespace. The LINQ-to-XML uses imperative code and the XElement object chained together to request children, and the XPath query uses a value that looks a lot like a file path statement (see Listing Four).

private static void FindChild()
{
   const string filename = "..\\..\\CurrentStats.xml";
   XElement xml = XElement.Load(filename);
   XElement child1 = xml.Element("Player")

   Console.WriteLine(child1);
   Console.ReadLine();
   // XPath expression using System.Xml.Linq capabilities
   XElement child2 = xml.XPathSelectElement("Player/Statistics/AverageAmount-
Lost");
   Console.WriteLine(child2);
   Console.ReadLine();
}

Listing Four: Selecting a Child Element with LINQ to XML and Then an XPath Query

The second half of Listing Three uses an XPath query, Player/Statistics/AverageAmountLost. Because both parts are using capabilities and classes in the System.Xml.Linq namespace, you can easily blend queries and chained XElement calls in the same code block.

Finding Siblings

Methods like XElement.ElementsAfterSelf request sibling elements. The first half of Listing Five requests the next sibling element, and the second half uses an XPath query to perform the same task.

private static void FindSibling()
{
  const string filename = "..\\..\\CurrentStats.xml";
  XElement xml = XElement.Load(filename);
  XElement child1 = xml.Element("Player")

  XElement sibling1 = child1.ElementsAfterSelf().First();
  Console.WriteLine(sibling1);
  Console.ReadLine();
  XElement child2 = xml.XPathSelectElement("Player/Statistics/AverageAmountWon");
  XElement sibling2 = child2.XPathSelectElement("following-sibling::*");
  Console.WriteLine(sibling2);
  Console.ReadLine();
}

Listing Five: LINQ-to-XML and XPath Supporting Requesting Siblings, Children, and Parents

The XPath query (or XQuery) following-sibling::* illustrates where I think XPath becomes less intuitive. The path statement Player/Statistics/AverageAmount looks like a path; following-sibling::* begs for a trip to the help documentation. However, because XPath is a W3C (World Wide Web Consortium) open standard, it is unlikely they will change it for us. (Note: XPath and XSLT are W3C open standards determined by a committee (or consortium). LINQ-to-XML is a proprietary part of the .NET Framework. This difference alone might discourage "open standards" wonks from using LINQ to XML, but something that can be gleaned by intuition gets higher marks than open standards for standards sake with me.)

Filtering Elements

Filtering XML documents with LINQ queries and where clauses was demonstrated. Listing Six demonstrates how to query the Player element with LINQ-to-XML and a LINQ query, and the second half of the code shows the equivalent behavior using an XQuery. Again, the LINQ query seems more intuitive than the XQuery Player[@Name='Player 1'].

private static void FilterOnAttribute()
{
  const string filename = "..\\..\\CurrentStats.xml";
  XElement xml = XElement.Load(filename);
  XElement player1 =
    (from elem in xml.Elements("Player")
    where elem.Attribute("Name").Value == "Player 1"
    select elem).First();
  Console.WriteLine(player1);
  XElement player2 = xml.XPathSelectElement("Player[@Name='Player 1']");
  Console.WriteLine(player2);
  Console.ReadLine();
}

Listing Six: Filtering with LINQ and XQuery

In the XQuery Player[@Name='Player 1'], Player is the node and the bracketed @Name part refers to the Name attribute and its value. The correct statement looks like an index operation, but perhaps from a C# programmer's point of view Player.Name = 'Player 1' would be more intuitive. This lends itself to my argument that if you are comfortable with C#, then using LINQ-to-XML and method calls might be significantly easier to pick up than XPath queries.

Without an exhaustive comparison of XPath and LINQ-to-XML, you get the idea. LINQ-to-XML is going to be XDocument and XElement method calls and XPath is going to be queries defined by that standard; the XPath query syntax is distinct from C# code. At some level, you will be able to do more with less typing if you use XPath just as you can do some very advanced comparisons with regular expressions with less typing. The decision matrix that helps you decide which technology to use depends on your experience and the experience of the members of your team.

Transforming XML Data Using Functional Construction

Functional construction is quite literally a way to create an XML tree in a single statement by chaining function calls together where subordinate calls are arguments to the calling method. (This same nesting of method calls is also often used in the CodeDOM namespace to create code graphs.)

The XML document in Listing One was created with functional construction. Quite straightforward really, the XML tree was created by chaining XElement (and XAttribute) objects together to form the shape of the XML document and calling the XElement.Save method.

private void SerializeGameStatistics(BlackJack game)
  {
    try
     {
       Statistics stats = game.Players[0].Statistics;
       //serialize game to XML
       XElement xml =
         new XElement("Blackjack",
          new XElement("Player",
            new XAttribute("Name", game.Players[0].Name),
            new XElement("Statistics",
            new XElement("AverageAmountLost", stats.AverageAmountLost),
            new XElement("AverageAmountWon", stats.AverageAmountWon),
            new XElement("Blackjacks", stats.BlackJacks),
            new XElement("Losses", stats.Losses),
            new XElement("NetAverageWinLoss", stats.NetAverageWinLoss),
            new XElement("NewWinLoss", stats.NetWinLoss),
            new XElement("PercentageOfBlackJacks",
               stats.PercentageOfBlackJacks),
            new XElement("PercentageOfLosses", stats.PercentageOfLosses),
            new XElement("PercentageOfPushes", stats.PercentageOfPushes),
            new XElement("PercentageOfWins", stats.PercentageOfWins),
            new XElement("Pushes", stats.Pushes),
            new XElement("Surrenders", stats.Surrenders),
            new XElement("TotalAmountLost", stats.TotalAmountLost),
            new XElement("TotalAmountWon", stats.TotalAmountWon),
            new XElement("Wins", stats.Wins))));
        xml.Save(
         Path.GetDirectoryName(
           Application.ExecutablePath) + "\\CurrentStats.xml");
     }
   catch{}
}

Listing Seven: Using Functional Construction, or Chains of System.Xml.Linq Objects to Shape the Desired Form of the XML Output

The method in Listing Seven was added to the Blackjack sample application. The data is derived from objects in that game, but the orchestration of the XElement objects could really be applied to any objects.

Summary

XPath is an open standard for querying and transforming XML documents; however, it is a completely different technology than C# programming, which you already know. With LINQ-to-XML, you can now do some of the things provided by XPath in a way you already know how to do them.

Rough Cuts is a Safari Books Online interactive publishing service that provides you with first access to pre-published manuscripts on hot technology topics-enabling you to stay on the cutting-edge and remain competitive. This version may not be final, the published book will be available in August 2008.