.NET

Parsing XML Files in .NET Using C#

By James McCaffrey, July 01, 2003

The .NET Framework provides several ways to extract data from an XML file into memory. We'll demonstrate the best uses of five fundamentally different techniques.

Download the code for this issue

Parsing XML files is an unglamorous task that can be time consuming and tricky. In the days before .NET, programmers were forced to read XML as a text file line by line and then use string functions and possibly regular expressions. This is a time-consuming and error-prone process, and just not very much fun.

While I was writing .NET test automation that had test case data stored in XML files, I discovered that the .NET Framework provides powerful new ways of parsing XML. But in conversations with colleagues, I also discovered that there are a variety of opinions on which way of parsing XML files is the best.

I set out to determine how many different ways there are to parse XML using .NET and to understand the pros and cons of each technique. After some experimentation, I learned that there are five fundamentally different ways to parse XML, and that the "best" method depends both on the particular development situation you are in and on the style of programming you prefer.

In the sections that follow, I will demonstrate how to parse a testCases.xml file using five different techniques. Each technique is based on a different .NET Framework class and its associated methods:

XmlTextReader
XmlDocument
XPathDocument
XmlSerializer
DataSet

After I explain each technique so you can modify my examples to suit your needs, I will give you guidance on which technique should be used in which situation. Knowing these five methods for parsing XML files will be a valuable addition to your .NET skill set. I'm assuming that you're familiar with C#, VS.NET, the creation and use of class libraries, and have a working knowledge of XML files.

The XML File to Parse and the Goal

Let's examine the testCases.xml file that we will use for all five parsing examples. The file contents are shown in Listing One.

Listing One: XML file to parse


<?xml version="1.0" encoding="utf-8" ?> 
<suite>

  <testcase id="001" kind="bvt">
    <inputs>
      <arg1>4</arg1>
      <arg2>7</arg2>
    </inputs>
    <expected>11.00</expected>
  </testcase>

  <testcase id="002" kind="drt">
    <inputs>
      <arg1>9</arg1>
      <arg2>6</arg2>
    </inputs>
    <expected>15.00</expected>
  </testcase>

  <testcase id="003" kind="bvt">
    <inputs>
      <arg1>5</arg1>
      <arg2>8</arg2>
    </inputs>
    <expected>13.00</expected>
  </testcase>
<
/suite>

Note that each of the three test cases has five data items: id, kind, arg1, arg2, and expected. Some of the data is stored as XML attributes (id and kind), and arg1 and arg2 are stored as XML elements two levels deep relative to the root node (suite). Extracting attribute data and dealing with nested elements are key tasks regardless of which parsing strategy we use.

The goal is to parse our XML test cases file and extract the data into memory in a form that we can use easily. The memory structure we will use for four of the five parsing methods is shown in Listing Two. (The method that employs an XmlSerializer object requires a slightly different memory structure and will be presented later.)

Listing Two: CommonLib.dll definitions


using System;
using System.Collections;

namespace CommonLib
{
  public class TestCase
  {
    public string id;
    public string kind;
    public string arg1;
    public string arg2;
    public string expected;
  }  

  public class Suite
  {
    public ArrayList items = new ArrayList();
    public void Display()
    {
      foreach (TestCase tc in items)
      {
        Console.Write(tc.id + " " + tc.kind + " " + tc.arg1 + " ");
        Console.WriteLine(tc.arg2 + " " + tc.expected);
      }
    }
  } // class Suite
} // ns

Because four of the five techniques will use these definitions, for convenience we can put the code in a .NET class library named "CommonLib." A TestCase object will hold the five data parts of each test case, and a Suite object will hold a collection of TestCase objects and provide a way to display it.

Once the XML data is parsed and stored, the result can be represented as shown in >Figure 1. The data can now be easily accessed and manipulated.

Figure 1 XML data stored in memory

Parsing XML with XmlTextReader

Of the five ways to parse an XML file, the most traditional technique is to use the XmlTextReader class. The example code is shown in Listing Three.

Listing Three: Parsing XML using XmlTextReader


using System;
using System.Xml;
using CommonLib;

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      CommonLib.Suite s = new CommonLib.Suite();
            
      XmlTextReader xtr = new XmlTextReader("..\\..\\..\\..\\testCases.xml");
      xtr.WhitespaceHandling = WhitespaceHandling.None;
      xtr.Read(); // read the XML declaration node, advance to <suite> tag

      while (!xtr.EOF) //load loop
      {
        if (xtr.Name == "suite" && !xtr.IsStartElement()) break;

        while (xtr.Name != "testcase" || !xtr.IsStartElement() ) 
          xtr.Read(); // advance to <testcase> tag

        CommonLib.TestCase tc = new CommonLib.TestCase();
        tc.id = xtr.GetAttribute("id");
        tc.kind = xtr.GetAttribute("kind");
        xtr.Read(); // advance to <inputs> tag
        xtr.Read(); // advance to <arg1> tag
        tc.arg1 = xtr.ReadElementString("arg1"); // consumes the </arg1> tag
        tc.arg2 = xtr.ReadElementString("arg2"); // consumes the </arg2> tag
        xtr.Read(); // advance to <expected> tag
        tc.expected = xtr.ReadElementString("expected"); // consumes the </expected> tag
        // we are now at an </testcase> tag
        s.items.Add(tc);
        xtr.Read(); // and now either at <testcase> tag or </suite> tag
      } // load loop

      xtr.Close();
      s.Display(); // show the suite of TestCases

    } // Main()
  } // class Class1
 } // ns Run

After creating a new C# Console Application Project in Visual Studio .NET, we add a Project Reference to the CommonLib.dll file that contains definitions for TestCase and Suite classes. We start by creating a Suite object to hold the XML data and an XmlTextReader object to parse the XML file.

The key to understanding this technique is to understand the Read() and ReadElementString() methods of XmlTextReader. To an XmlTextReader object, an XML file is a sequence of nodes. For example,

<?xml version="1.0" ?>
<foo>
  <bar>99</bar>
</foo>

has 6 nodes: the XML declaration, <foo>, <bar>, 99, </bar>, and </foo>.

The Read() method advances one node at a time. Unlike many Read() methods in other classes, the System.XmlTextReader.Read() does not return significant data. The ReadElementString() method, on the other hand, returns the data between the begin and end tags of its argument, and advances to the next node after the end tag. Because XML attributes are not nodes, we have to extract attribute data using the GetAttribute() method.

Figure 2 shows the output of running this program. You can see that we have successfully parsed the data from testCases.xml into memory.

Figure 2 Output from the XmlTextReader technique

The statement xtr.WhitespaceHandling = WhitespaceHandling.None; is important because without it you would have to Read() over newline characters and blank lines.

The main loop control structure that I used is not elegant but is more readable than the alternatives:

while (!xtr.EOF) //load loop
      {
        if (xtr.Name == "suite" && !xtr.IsStartElement()) break;

It exits when we are at EOF or an </suite> tag.

When marching through the XML file, you can either Read() your way one node at a time or get a bit more sophisticated with code like the following:

while (xtr.Name != "testcase" || !xtr.IsStartElement() ) 
          xtr.Read();          // advance to <testcase> tag

The choice of technique you use is purely a matter of style.

Parsing an XML file with XmlTextReader has a traditional, pre-.NET feel. You walk sequentially through the file using Read(), and extract data with ReadElementString() and GetAttribute(). Using XmlTextReader is straightforward and effective and is appropriate when the structure of your XML file is relatively simple and consistent. Compared to other techniques we will see in this article, XmlTextReader operates at a lower level of abstraction, meaning it is up to you as a programmer to keep track of where you are in the XML file and Read() correctly.

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

Parsing XML Files in .NET Using C#

The XML File to Parse and the Goal

Listing One: XML file to parse

Listing Two: CommonLib.dll definitions

Figure 1 XML data stored in memory

Parsing XML with XmlTextReader

Listing Three: Parsing XML using XmlTextReader

Figure 2 Output from the XmlTextReader technique

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

Parsing XML Files in .NET Using C#

The XML File to Parse and the Goal

Listing One: XML file to parse

Listing Two: CommonLib.dll definitions

Figure 1 XML data stored in memory

Parsing XML with XmlTextReader

Listing Three: Parsing XML using XmlTextReader

Figure 2 Output from the XmlTextReader technique

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content