Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Parsing XML Files in .NET Using C#


Parsing XML with XmlDocument

The second of five ways to parse an XML file is to use the XmlDocument class. The example code is shown in Listing Four.

Listing Four: Parsing XML using XmlDocument


using System;
using System.Xml;
using CommonLib;

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      CommonLib.Suite s = new CommonLib.Suite();

      XmlDocument xd = new XmlDocument();
      xd.Load("..\\..\\..\\..\\testCases.xml");
      
      XmlNodeList nodelist = xd.SelectNodes("/suite/testcase"); // get all <testcase> nodes

      foreach (XmlNode node in nodelist) // for each <testcase> node
      {
        CommonLib.TestCase tc = new CommonLib.TestCase();
        
        tc.id = node.Attributes.GetNamedItem("id").Value;
        tc.kind = node.Attributes.GetNamedItem("kind").Value;

        XmlNode n = node.SelectSingleNode("inputs"); // get the one <input> node
        tc.arg1 = n.ChildNodes.Item(0).InnerText;
        tc.arg2 = n.ChildNodes.Item(1).InnerText;

        tc.expected = node.ChildNodes.Item(1).InnerText;

        s.items.Add(tc);
      } // foreach <testcase> node
      
      s.Display();

    } // Main()
  } // class Class1

} // ns Run

XmlDocument objects are based on the notion of XML nodes and child nodes. Instead of sequentially navigating through a file, we select sets of nodes with the SelectNodes() method or individual nodes with the SelectSingleNode() method. Notice that because XML attributes are not nodes, we must get their data with an Attributes.GetNamedItem() method applied to a node.

After loading the XmlDocument, we fetch all the test case nodes at once with:

XmlNodeList nodelist = xd.SelectNodes("/suite/testcase");

Then we iterate through this list of nodes and fetch each <input> node with:

XmlNode n = node.SelectSingleNode("inputs");

and then extract the arg1 (and similarly arg2) value using:

tc.arg1 = n.ChildNodes.Item(0).InnerText;

In this statement, n is the <inputs> node; ChildNodes.Item(0) is the first element of <inputs>, i.e., <arg1> and InnerText is the value between <arg1> and </arg1>.

The output from running this program is shown in Figure 3. Notice it is identical to the output from running the XmlTextReader technique and, in fact, all the other techniques presented in this article.

Figure 3 Output from the XmlDocument technique


The XmlDocument class is modeled on the W3C XML Document Object Model and has a different feel to it than many .NET Framework classes that you are familiar with. Using the XmlDocument class is appropriate if you need to extract data in a nonsequential manner, or if you are already using XmlDocument objects and want to maintain a consistent look and feel to your application's code.

Let me note that in discussions with my colleagues, there was often some confusion about the role of the XmlDataDocument class. It is derived from the XmlDocument class and is intended for use in conjunction with DataSet objects. So, in this example, you could use the XmlDataDocument class but would not gain anything.

Parsing XML with XPathDocument

The third technique to parse an XML file is to use the XPathDocument class. The example code is shown in Listing Five.

Listing Five: Parsing XML using XPathDocument


using System;
using System.Xml.XPath;
using CommonLib;

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      CommonLib.Suite s = new CommonLib.Suite();

      XPathDocument xpd = new XPathDocument("..\\..\\..\\..\\testCases.xml");
      XPathNavigator xpn = xpd.CreateNavigator();
      XPathNodeIterator xpi = xpn.Select("/suite/testcase");
      
      while (xpi.MoveNext()) // each testcase node
      {
        CommonLib.TestCase tc = new CommonLib.TestCase();
        tc.id = xpi.Current.GetAttribute("id", xpn.NamespaceURI);
        tc.kind = xpi.Current.GetAttribute("kind", xpn.NamespaceURI);

        XPathNodeIterator tcChild = xpi.Current.SelectChildren(XPathNodeType.Element);
        while (tcChild.MoveNext()) // each part (<inputs> and <expected>) of <testcase>
        {
          if (tcChild.Current.Name == "inputs")
          {
            XPathNodeIterator tcSubChild = tcChild.Current.SelectChildren(XPathNodeType.Element);
            while (tcSubChild.MoveNext()) // each part (<arg1>, <arg2>) of <inputs>
            {
              if (tcSubChild.Current.Name == "arg1")
                tc.arg1 = tcSubChild.Current.Value;
              else if (tcSubChild.Current.Name  == "arg2")
                tc.arg2 = tcSubChild.Current.Value;
            }
          }
          else if (tcChild.Current.Name == "expected")
            tc.expected = tcChild.Current.Value;
        }
        s.items.Add(tc);

      } // each testcase node
      
      s.Display();
      
    } // Main()
  } // class Class1

} // ns Run

Using an XPathDocument object to parse XML has a hybrid feel that is part procedural (as in XmlTextReader) and part functional (as in XmlDocument). You can select parts of the document using the Select() method of an XPathNavigator object and also move through the document using the MoveNext() method of an XPathNodeIterator object.

After loading the XPathDocument object, we get what is in essence a reference to the first <testcase> node into an XPathNodeIterator object with:

XPathNavigator xpn = xpd.CreateNavigator();
XPathNodeIterator xpi = xpn.Select("/suite/testcase");

Because XPathDocument does not maintain "node identity," we must iterate through each <testcase> node with this loop:

while (xpi.MoveNext())

Similarly, we have to iterate through the children with:

while (tcChild.MoveNext())

The XPathDocument class is optimized for XPath data model queries. So using it is particularly appropriate when the XML file to parse is deeply nested or has a complex structure. You might also consider using XPathDocument if other parts of your application code use that class so that you maintain a consistent coding look and feel.

Parsing XML with XmlSerializer

The fourth technique we will use to parse an XML file is the XmlSerializer object. The example code is shown in Listing Six.

Listing Six: Parsing XML using XmlSerializer

using System;
using System.Xml.Serialization;
using System.IO;
using SerializerLib; // defines a Suite class compatible with testCases.xml

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      XmlSerializer xs = new XmlSerializer(typeof(Suite));
      StreamReader sr = new StreamReader("..\\..\\..\\..\\testCases.xml");
      SerializerLib.Suite s = (SerializerLib.Suite)xs.Deserialize(sr);
      sr.Close();
      s.Display();
    } 
  } // class Class1
} // ns Run

Using the XmlSerializer class is significantly different from using any of the other classes because the in-memory data store is different from the CommonLib.Suite we used for all other examples. In fact, observe that pulling the XML data into memory is accomplished in a single statement:

SerializerLib.Suite s = (SerializerLib.Suite)xs.Deserialize(sr);

I created a class library named "SerializerLib" to hold the definition for a Suite class that corresponds to the testCases.xml file so that the XmlSerializer object can store the XML data into it. The trick, of course, is to set up this Suite class.

Creating the Suite class is done with the help of the xsd.exe command-line tool. You will find it in your Program Files\Microsoft Visual Studio .NET\FrameworkSDK\bin folder. I used xsd.exe to generate a Suite class and then modified it slightly by changing some names and adding a Display() method.

The screen shot in Figure 4 shows how I generated the file testCases.cs, which contains a Suite definition that you can use directly or modify as I did. Listings Seven and Eight show the classes generated by XSD and my modified classes in the SerializerLib library.

Figure 4 Generating testCases.cs definitions using XSD


Listing Seven: XSD-generated suite definition


// This source code was auto-generated by xsd, Version=1.0.3705.288.
// 
using System.Xml.Serialization;

[System.Xml.Serialization.XmlRootAttribute("suite", Namespace="", IsNullable=false)]
public class suite {
    [System.Xml.Serialization.XmlElementAttribute("testcase")]
    public suiteTestcase[] Items;
}

public class suiteTestcase {
    public string expected;
    [System.Xml.Serialization.XmlElementAttribute("inputs")]
    public suiteTestcaseInputs[] inputs;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string id;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string kind;
}

public class suiteTestcaseInputs {
    public string arg1;
    public string arg2;
}

Listing Eight: Modified suite definition

using System;
using System.Xml.Serialization;

namespace SerializerLib
{
  [XmlRootAttribute("suite")]
  public class Suite 
  {
    [XmlElementAttribute("testcase")]
    public TestCase[] items; // changed name from xsd-generated code
    public void Display() // added to xsd-generated code
    {
      foreach (TestCase tc in items)
      {
        Console.Write(tc.id + " " + tc.kind + " "  + tc.inputs.arg1 + " ");
        Console.WriteLine(tc.inputs.arg2 + " " + tc.expected);
      }
    }
  }

  public class TestCase  // changed name from xsd-generated code
  {
    [XmlAttributeAttribute()]
    public string id;
    [XmlAttributeAttribute()]
    public string kind;
    [XmlElementAttribute("inputs")]
    public Inputs inputs; // change from xsd-generated code: no array
    public string expected;
  }

  public class Inputs // changed name from xsd-generated code
  {
    public string arg1;
    public string arg2;
  }
}

Using the XmlSerializer class gives a very elegant solution to the problem of parsing an XML file. Compared with the other four techniques in this article, XmlSerializer operates at the highest level of abstraction, meaning that the algorithmic details are largely hidden from you. But this gives you less control over the parsing and lends an air of magic to the process.

Most of the code I write is test automation, and using XmlSerializer is my default technique for parsing XML. XmlSerializer is most appropriate for situations not covered by the other four techniques in this article: fine-grained control is not required, the application program does not use other XmlDocument objects, the XML file is not deeply nested, and the application is not primarily an ADO .NET application (as we will see in our next example).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.