Download the code for this issue
Parsing XML files is an unglamorous task that can be time consuming and tricky. In the days before .NET, programmers were forced to read XML as a text file line by line and then use string functions and possibly regular expressions. This is a time-consuming and error-prone process, and just not very much fun.
While I was writing .NET test automation that had test case data stored in XML files, I discovered that the .NET Framework provides powerful new ways of parsing XML. But in conversations with colleagues, I also discovered that there are a variety of opinions on which way of parsing XML files is the best.
I set out to determine how many different ways there are to parse XML using .NET and to understand the pros and cons of each technique. After some experimentation, I learned that there are five fundamentally different ways to parse XML, and that the "best" method depends both on the particular development situation you are in and on the style of programming you prefer.
In the sections that follow, I will demonstrate how to parse a testCases.xml file using five different techniques. Each technique is based on a different .NET Framework class and its associated methods:
- XmlTextReader
- XmlDocument
- XPathDocument
- XmlSerializer
- DataSet
After I explain each technique so you can modify my examples to suit your needs, I will give you guidance on which technique should be used in which situation. Knowing these five methods for parsing XML files will be a valuable addition to your .NET skill set. I'm assuming that you're familiar with C#, VS.NET, the creation and use of class libraries, and have a working knowledge of XML files.
The XML File to Parse and the Goal
Let's examine the testCases.xml file that we will use for all five parsing examples. The file contents are shown in Listing One.
Listing One: XML file to parse
<?xml version="1.0" encoding="utf-8" ?> <suite> <testcase id="001" kind="bvt"> <inputs> <arg1>4</arg1> <arg2>7</arg2> </inputs> <expected>11.00</expected> </testcase> <testcase id="002" kind="drt"> <inputs> <arg1>9</arg1> <arg2>6</arg2> </inputs> <expected>15.00</expected> </testcase> <testcase id="003" kind="bvt"> <inputs> <arg1>5</arg1> <arg2>8</arg2> </inputs> <expected>13.00</expected> </testcase> < /suite>
Note that each of the three test cases has five data items: id, kind, arg1, arg2,
and expected
. Some of the data is stored as XML attributes (id
and kind
), and arg1
and arg2
are stored as XML elements two levels deep relative to the root node (suite
). Extracting attribute data and dealing with nested elements are key tasks regardless of which parsing strategy we use.
The goal is to parse our XML test cases file and extract the data into memory in a form that we can use easily. The memory structure we will use for four of the five parsing methods is shown in Listing Two. (The method that employs an XmlSerializer
object requires a slightly different memory structure and will be presented later.)
Listing Two: CommonLib.dll definitions
using System; using System.Collections; namespace CommonLib { public class TestCase { public string id; public string kind; public string arg1; public string arg2; public string expected; } public class Suite { public ArrayList items = new ArrayList(); public void Display() { foreach (TestCase tc in items) { Console.Write(tc.id + " " + tc.kind + " " + tc.arg1 + " "); Console.WriteLine(tc.arg2 + " " + tc.expected); } } } // class Suite } // ns
Because four of the five techniques will use these definitions, for convenience we can put the code in a .NET class library named "CommonLib." A TestCase
object will hold the five data parts of each test case, and a Suite
object will hold a collection of TestCase
objects and provide a way to display it.
Once the XML data is parsed and stored, the result can be represented as shown in >Figure 1. The data can now be easily accessed and manipulated.
Figure 1 XML data stored in memory
Parsing XML with XmlTextReader
Of the five ways to parse an XML file, the most traditional technique is to use the XmlTextReader
class. The example code is shown in Listing Three.
Listing Three: Parsing XML using XmlTextReader
using System; using System.Xml; using CommonLib; namespace Run { class Class1 { [STAThread] static void Main(string[] args) { CommonLib.Suite s = new CommonLib.Suite(); XmlTextReader xtr = new XmlTextReader("..\\..\\..\\..\\testCases.xml"); xtr.WhitespaceHandling = WhitespaceHandling.None; xtr.Read(); // read the XML declaration node, advance to <suite> tag while (!xtr.EOF) //load loop { if (xtr.Name == "suite" && !xtr.IsStartElement()) break; while (xtr.Name != "testcase" || !xtr.IsStartElement() ) xtr.Read(); // advance to <testcase> tag CommonLib.TestCase tc = new CommonLib.TestCase(); tc.id = xtr.GetAttribute("id"); tc.kind = xtr.GetAttribute("kind"); xtr.Read(); // advance to <inputs> tag xtr.Read(); // advance to <arg1> tag tc.arg1 = xtr.ReadElementString("arg1"); // consumes the </arg1> tag tc.arg2 = xtr.ReadElementString("arg2"); // consumes the </arg2> tag xtr.Read(); // advance to <expected> tag tc.expected = xtr.ReadElementString("expected"); // consumes the </expected> tag // we are now at an </testcase> tag s.items.Add(tc); xtr.Read(); // and now either at <testcase> tag or </suite> tag } // load loop xtr.Close(); s.Display(); // show the suite of TestCases } // Main() } // class Class1 } // ns Run
After creating a new C# Console Application Project in Visual Studio .NET, we add a Project Reference to the CommonLib.dll file that contains definitions for TestCase
and Suite
classes. We start by creating a Suite
object to hold the XML data and an XmlTextReader
object to parse the XML file.
The key to understanding this technique is to understand the Read()
and ReadElementString()
methods of XmlTextReader
. To an XmlTextReader
object, an XML file is a sequence of nodes. For example,
<?xml version="1.0" ?> <foo> <bar>99</bar> </foo>
has 6 nodes: the XML declaration, <foo>, <bar>, 99, </bar>, and </foo>.
The Read()
method advances one node at a time. Unlike many Read()
methods in other classes, the System.XmlTextReader.Read()
does not return significant data. The ReadElementString()
method, on the other hand, returns the data between the begin and end tags of its argument, and advances to the next node after the end tag. Because XML attributes are not nodes, we have to extract attribute data using the GetAttribute()
method.
Figure 2 shows the output of running this program. You can see that we have successfully parsed the data from testCases.xml into memory.
Figure 2 Output from the XmlTextReader technique
The statement xtr.WhitespaceHandling = WhitespaceHandling.None;
is important because without it you would have to Read()
over newline characters and blank lines.
The main loop control structure that I used is not elegant but is more readable than the alternatives:
while (!xtr.EOF) //load loop { if (xtr.Name == "suite" && !xtr.IsStartElement()) break;
It exits when we are at EOF or an </suite>
tag.
When marching through the XML file, you can either Read()
your way one node at a time or get a bit more sophisticated with code like the following:
while (xtr.Name != "testcase" || !xtr.IsStartElement() ) xtr.Read(); // advance to <testcase> tag
The choice of technique you use is purely a matter of style.
Parsing an XML file with XmlTextReader
has a traditional, pre-.NET feel. You walk sequentially through the file using Read()
, and extract data with ReadElementString()
and GetAttribute()
. Using XmlTextReader
is straightforward and effective and is appropriate when the structure of your XML file is relatively simple and consistent. Compared to other techniques we will see in this article, XmlTextReader
operates at a lower level of abstraction, meaning it is up to you as a programmer to keep track of where you are in the XML file and Read()
correctly.