.NET

Parsing XML Files in .NET Using C#

By James McCaffrey, July 01, 2003

The .NET Framework provides several ways to extract data from an XML file into memory. We'll demonstrate the best uses of five fundamentally different techniques.

Parsing XML with DataSet

The fifth and final method we will use to parse an XML file into memory uses the DataSet class. The example code is shown in Listing Nine.

Listing Nine: Parsing XML using DataSet

using System;
using System.Xml;
using System.Data;
using CommonLib; // Suite class definition
using InfoLib; // DisplayInfo() method 

namespace Run
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      DataSet ds = new DataSet();
      ds.ReadXml("..\\..\\..\\..\\testCases.xml");
    
      InfoLib.DataSetInfo.DisplayInfo(ds); // show table, column, relation names

      CommonLib.Suite s = new CommonLib.Suite();
      foreach (DataRow row in ds.Tables["testcase"].Rows)
      {
        CommonLib.TestCase tc = new CommonLib.TestCase();
        tc.id = row["id"].ToString();
        tc.kind = row["kind"].ToString();
        tc.expected = row["expected"].ToString();

        DataRow[] children = row.GetChildRows("testcase_inputs"); // relation name

        tc.arg1 = (children[0]["arg1"]).ToString(); // there is only 1 row in children
        tc.arg2 = (children[0]["arg2"]).ToString();
        
        s.items.Add(tc);
      }

      s.Display();
 
    } // Main()

  } // class Class1
} // ns

We start by reading the XML file directly into a System.Data.DataSet object using the ReadXml() method. A DataSet object can be thought of as an in-memory relational database. The XML data ends up in two tables, "testcase" and "inputs," that are related through a relation "testcase_inputs." The key to using this DataSet technique is to know the way to determine how the XML data gets stored into the DataSet object.

Although we could create a custom DataSet object with completely known characteristics, it is much quicker to let the ReadXml() method do the work and then examine the result. I wrote a helper function DisplayInfo() that accepts a DataSet as an argument and displays the information we need to extract the data from the DataSet's tables.

To keep the main parse program uncluttered, I put DisplayInfo() into a class library named "InfoLib." The code is shown in Listing Ten. The output from running the parse program is shown in Figure 5.

Listing Ten: Code to display DataSet information


using System;
using System.Data;

namespace InfoLib
{
  public class DataSetInfo
  {
    public static void DisplayInfo(DataSet ds) // names of tables, columns, relations in ds
    {
      foreach (DataTable dt in ds.Tables)
      {
        Console.WriteLine("\n===============================================");
        Console.WriteLine("Table = " + dt.TableName + "\n");
        foreach (DataColumn dc in dt.Columns)
        {
          Console.Write("{0,-14}", dc.ColumnName);
        }
        Console.WriteLine("\n-----------------------------------------------");

        foreach (DataRow dr in dt.Rows)
        {
          foreach (object data in dr.ItemArray)
          {
            Console.Write("{0,-14}", data.ToString());

          }
          Console.WriteLine();
        }
        Console.WriteLine("===============================================");
      } // foreach DataTable

      foreach (DataRelation dr in ds.Relations)
      {
        Console.WriteLine("\n\nRelations:");
        Console.WriteLine(dr.RelationName + "\n\n");
      }

    } // DisplayInfo()
  } // class DataSetInfo
} // ns InfoLib

Figure 5 Output from the DataSet technique

The first table, "testcase," holds the data that is one level deep from the XML root: id, kind, and expected. The second table, "inputs," holds data that is two levels deep: arg1 and arg2. In general, if your XML file is n levels deep, ReadXml() will generate n tables.

Extracting the data from the parent test case table is easy. We just iterate through each row of the table and access by column name. To get the data from the child table inputs, we get an array of rows using the GetChildRows method:

DataRow[] children = row.GetChildRows("testcase_inputs");  // relation name

Because each <testcase> node has only one <inputs> child node, the children array will only have one row.

The trickiest aspect of this technique is to extract the child data:

tc.arg1 = (children[0]["arg1"]).ToString();  // there is only 1 row in children

Using the DataSet class to parse an XML file has a very relational database feel. Compared with other techniques in this article, it operates at a middle level of abstraction. The ReadXml() method hides a lot of details but you must traverse through relational tables.

Using DataSet to parse XML files is particularly appropriate when your application program is using ADO .NET classes so that you maintain a consistent look and feel. Using a DataSet object has high overhead and would not be a good choice if performance is an issue. Because each level of an XML file generates a table, if your XML file is deeply nested then using DataSet would not be a good choice.

Further Discussion

There are several related issues not yet covered: namespaces, generalization, error handling, validation, filtering, and performance. In the context of parsing XML data files, XML namespaces are a mechanism to prevent name clashes. Each of the techniques we've used can deal with namespaces. The MSDN Library will give you all the information you need to handle XML files with namespaces.

The techniques we have seen were not written to be particularly general. If you have a different XML structure, you will have to write different code. There is always a trade-off between writing code for a specific situation and making the code more generalized.

The code in this article does not have any error handling. Parsing XML files is quite error prone and in a production scenario, you would need to add lots of try-catch blocks to create a robust parser.

Additionally, I didn't address XML validation with schema files, but once again, in a production environment you would need to generate XML schema files and validate your XML data files against them before attempting to parse. It is possible to add validation to your parsing code, but I recommend validating before parsing.

In every example, we have read all the XML data into memory. In many cases, you will want to filter and just read in some data. All the techniques in this article can be modified to provide front-end filtering. The XPathDocument class has especially nice filtering capabilities by way of XPath syntax.

If performance is an issue — usually in the case where you are parsing many small XML files — you will have to run some timing measurements to determine if your chosen technique is fast enough. Performance is too tricky to make many general statements and the only way to know if your performance is acceptable is to try your code. As a guideline, however, XmlTextReader has the best performance characteristics.

A Key Skill

XML data files are a key component of Microsoft's .NET developer environment. The ability to parse data from XML files into memory is a key skill in a .NET setting. Each of the five techniques, based on the XmlTextReader, XmlDocument, XPathDocument, XmlSerializer, and DataSet classes, is significantly different in terms of coding mechanics, coding mind set, and scenarios for usage. The .NET Framework gives you great flexibility in parsing XML data files and makes this essential task much easier and less error prone than using non-.NET techniques.

References

XML in .NET Overview, http://msdn.microsoft.com/msdnmag/issues/01/01/xml/xml.asp

Consume XML C# app, http://msdn.microsoft.com/library/en-us/vcedit/html/

vcwlkVisualCApplicationsConsumingXMLData.asp

XML Schema, http://msdn.microsoft.com/msdnmag/issues/02/04/xml/xml0204.asp

XML Namespaces, http://msdn.microsoft.com/msdnmag/issues/01/07/xml/default.aspx

Dr. James McCaffrey works for Volt Information Sciences Inc. where he manages technical training for software engineers working at Microsoft's Redmond, WA campus. He has worked on several Microsoft products, including Internet Explorer and MSN Search.

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

Parsing XML Files in .NET Using C#

Parsing XML with DataSet

Listing Nine: Parsing XML using DataSet

Listing Ten: Code to display DataSet information

Figure 5 Output from the DataSet technique

Further Discussion

A Key Skill

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

Parsing XML Files in .NET Using C#

Parsing XML with DataSet

Listing Nine: Parsing XML using DataSet

Listing Ten: Code to display DataSet information

Figure 5 Output from the DataSet technique

Further Discussion

A Key Skill

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content