JVM Languages

Parsing XML Files in .NET Using C#

By James McCaffrey, July 01, 2003

The .NET Framework provides several ways to extract data from an XML file into memory. We'll demonstrate the best uses of five fundamentally different techniques.

Parsing XML with XmlDocument

The second of five ways to parse an XML file is to use the XmlDocument class. The example code is shown in Listing Four.

Listing Four: Parsing XML using XmlDocument


using System;
using System.Xml;
using CommonLib;

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      CommonLib.Suite s = new CommonLib.Suite();

      XmlDocument xd = new XmlDocument();
      xd.Load("..\\..\\..\\..\\testCases.xml");
      
      XmlNodeList nodelist = xd.SelectNodes("/suite/testcase"); // get all <testcase> nodes

      foreach (XmlNode node in nodelist) // for each <testcase> node
      {
        CommonLib.TestCase tc = new CommonLib.TestCase();
        
        tc.id = node.Attributes.GetNamedItem("id").Value;
        tc.kind = node.Attributes.GetNamedItem("kind").Value;

        XmlNode n = node.SelectSingleNode("inputs"); // get the one <input> node
        tc.arg1 = n.ChildNodes.Item(0).InnerText;
        tc.arg2 = n.ChildNodes.Item(1).InnerText;

        tc.expected = node.ChildNodes.Item(1).InnerText;

        s.items.Add(tc);
      } // foreach <testcase> node
      
      s.Display();

    } // Main()
  } // class Class1

} // ns Run

XmlDocument objects are based on the notion of XML nodes and child nodes. Instead of sequentially navigating through a file, we select sets of nodes with the SelectNodes() method or individual nodes with the SelectSingleNode() method. Notice that because XML attributes are not nodes, we must get their data with an Attributes.GetNamedItem() method applied to a node.

After loading the XmlDocument, we fetch all the test case nodes at once with:

XmlNodeList nodelist = xd.SelectNodes("/suite/testcase");

Then we iterate through this list of nodes and fetch each <input> node with:

XmlNode n = node.SelectSingleNode("inputs");

and then extract the arg1 (and similarly arg2) value using:

tc.arg1 = n.ChildNodes.Item(0).InnerText;

In this statement, n is the <inputs> node; ChildNodes.Item(0) is the first element of <inputs>, i.e., <arg1> and InnerText is the value between <arg1> and </arg1>.

The output from running this program is shown in Figure 3. Notice it is identical to the output from running the XmlTextReader technique and, in fact, all the other techniques presented in this article.

Figure 3 Output from the XmlDocument technique

The XmlDocument class is modeled on the W3C XML Document Object Model and has a different feel to it than many .NET Framework classes that you are familiar with. Using the XmlDocument class is appropriate if you need to extract data in a nonsequential manner, or if you are already using XmlDocument objects and want to maintain a consistent look and feel to your application's code.

Let me note that in discussions with my colleagues, there was often some confusion about the role of the XmlDataDocument class. It is derived from the XmlDocument class and is intended for use in conjunction with DataSet objects. So, in this example, you could use the XmlDataDocument class but would not gain anything.

Parsing XML with XPathDocument

The third technique to parse an XML file is to use the XPathDocument class. The example code is shown in Listing Five.

Listing Five: Parsing XML using XPathDocument


using System;
using System.Xml.XPath;
using CommonLib;

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      CommonLib.Suite s = new CommonLib.Suite();

      XPathDocument xpd = new XPathDocument("..\\..\\..\\..\\testCases.xml");
      XPathNavigator xpn = xpd.CreateNavigator();
      XPathNodeIterator xpi = xpn.Select("/suite/testcase");
      
      while (xpi.MoveNext()) // each testcase node
      {
        CommonLib.TestCase tc = new CommonLib.TestCase();
        tc.id = xpi.Current.GetAttribute("id", xpn.NamespaceURI);
        tc.kind = xpi.Current.GetAttribute("kind", xpn.NamespaceURI);

        XPathNodeIterator tcChild = xpi.Current.SelectChildren(XPathNodeType.Element);
        while (tcChild.MoveNext()) // each part (<inputs> and <expected>) of <testcase>
        {
          if (tcChild.Current.Name == "inputs")
          {
            XPathNodeIterator tcSubChild = tcChild.Current.SelectChildren(XPathNodeType.Element);
            while (tcSubChild.MoveNext()) // each part (<arg1>, <arg2>) of <inputs>
            {
              if (tcSubChild.Current.Name == "arg1")
                tc.arg1 = tcSubChild.Current.Value;
              else if (tcSubChild.Current.Name  == "arg2")
                tc.arg2 = tcSubChild.Current.Value;
            }
          }
          else if (tcChild.Current.Name == "expected")
            tc.expected = tcChild.Current.Value;
        }
        s.items.Add(tc);

      } // each testcase node
      
      s.Display();
      
    } // Main()
  } // class Class1

} // ns Run

Using an XPathDocument object to parse XML has a hybrid feel that is part procedural (as in XmlTextReader) and part functional (as in XmlDocument). You can select parts of the document using the Select() method of an XPathNavigator object and also move through the document using the MoveNext() method of an XPathNodeIterator object.

After loading the XPathDocument object, we get what is in essence a reference to the first <testcase> node into an XPathNodeIterator object with:

XPathNavigator xpn = xpd.CreateNavigator();
XPathNodeIterator xpi = xpn.Select("/suite/testcase");

Because XPathDocument does not maintain "node identity," we must iterate through each <testcase> node with this loop:

while (xpi.MoveNext())

Similarly, we have to iterate through the children with:

while (tcChild.MoveNext())

The XPathDocument class is optimized for XPath data model queries. So using it is particularly appropriate when the XML file to parse is deeply nested or has a complex structure. You might also consider using XPathDocument if other parts of your application code use that class so that you maintain a consistent coding look and feel.

Parsing XML with XmlSerializer

The fourth technique we will use to parse an XML file is the XmlSerializer object. The example code is shown in Listing Six.

Listing Six: Parsing XML using XmlSerializer

using System;
using System.Xml.Serialization;
using System.IO;
using SerializerLib; // defines a Suite class compatible with testCases.xml

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      XmlSerializer xs = new XmlSerializer(typeof(Suite));
      StreamReader sr = new StreamReader("..\\..\\..\\..\\testCases.xml");
      SerializerLib.Suite s = (SerializerLib.Suite)xs.Deserialize(sr);
      sr.Close();
      s.Display();
    } 
  } // class Class1
} // ns Run

Using the XmlSerializer class is significantly different from using any of the other classes because the in-memory data store is different from the CommonLib.Suite we used for all other examples. In fact, observe that pulling the XML data into memory is accomplished in a single statement:

SerializerLib.Suite s = (SerializerLib.Suite)xs.Deserialize(sr);

I created a class library named "SerializerLib" to hold the definition for a Suite class that corresponds to the testCases.xml file so that the XmlSerializer object can store the XML data into it. The trick, of course, is to set up this Suite class.

Creating the Suite class is done with the help of the xsd.exe command-line tool. You will find it in your Program Files\Microsoft Visual Studio .NET\FrameworkSDK\bin folder. I used xsd.exe to generate a Suite class and then modified it slightly by changing some names and adding a Display() method.

The screen shot in Figure 4 shows how I generated the file testCases.cs, which contains a Suite definition that you can use directly or modify as I did. Listings Seven and Eight show the classes generated by XSD and my modified classes in the SerializerLib library.

Figure 4 Generating testCases.cs definitions using XSD

Listing Seven: XSD-generated suite definition


// This source code was auto-generated by xsd, Version=1.0.3705.288.
// 
using System.Xml.Serialization;

[System.Xml.Serialization.XmlRootAttribute("suite", Namespace="", IsNullable=false)]
public class suite {
    [System.Xml.Serialization.XmlElementAttribute("testcase")]
    public suiteTestcase[] Items;
}

public class suiteTestcase {
    public string expected;
    [System.Xml.Serialization.XmlElementAttribute("inputs")]
    public suiteTestcaseInputs[] inputs;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string id;
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public string kind;
}

public class suiteTestcaseInputs {
    public string arg1;
    public string arg2;
}

Listing Eight: Modified suite definition

using System;
using System.Xml.Serialization;

namespace SerializerLib
{
  [XmlRootAttribute("suite")]
  public class Suite 
  {
    [XmlElementAttribute("testcase")]
    public TestCase[] items; // changed name from xsd-generated code
    public void Display() // added to xsd-generated code
    {
      foreach (TestCase tc in items)
      {
        Console.Write(tc.id + " " + tc.kind + " "  + tc.inputs.arg1 + " ");
        Console.WriteLine(tc.inputs.arg2 + " " + tc.expected);
      }
    }
  }

  public class TestCase  // changed name from xsd-generated code
  {
    [XmlAttributeAttribute()]
    public string id;
    [XmlAttributeAttribute()]
    public string kind;
    [XmlElementAttribute("inputs")]
    public Inputs inputs; // change from xsd-generated code: no array
    public string expected;
  }

  public class Inputs // changed name from xsd-generated code
  {
    public string arg1;
    public string arg2;
  }
}

Using the XmlSerializer class gives a very elegant solution to the problem of parsing an XML file. Compared with the other four techniques in this article, XmlSerializer operates at the highest level of abstraction, meaning that the algorithmic details are largely hidden from you. But this gives you less control over the parsing and lends an air of magic to the process.

Most of the code I write is test automation, and using XmlSerializer is my default technique for parsing XML. XmlSerializer is most appropriate for situations not covered by the other four techniques in this article: fine-grained control is not required, the application program does not use other XmlDocument objects, the XML file is not deeply nested, and the application is not primarily an ADO .NET application (as we will see in our next example).

Previous 1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

JVM Languages

Parsing XML Files in .NET Using C#

Parsing XML with XmlDocument

Listing Four: Parsing XML using XmlDocument

Figure 3 Output from the XmlDocument technique

Parsing XML with XPathDocument

Listing Five: Parsing XML using XPathDocument

Parsing XML with XmlSerializer

Listing Six: Parsing XML using XmlSerializer

Figure 4 Generating testCases.cs definitions using XSD

Listing Seven: XSD-generated suite definition

Listing Eight: Modified suite definition

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

JVM Languages Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

JVM Languages

Parsing XML Files in .NET Using C#

Parsing XML with XmlDocument

Listing Four: Parsing XML using XmlDocument

Figure 3 Output from the XmlDocument technique

Parsing XML with XPathDocument

Listing Five: Parsing XML using XPathDocument

Parsing XML with XmlSerializer

Listing Six: Parsing XML using XmlSerializer

Figure 4 Generating testCases.cs definitions using XSD

Listing Seven: XSD-generated suite definition

Listing Eight: Modified suite definition

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

JVM Languages Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content