Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

XML Serialization


XML Serialization

Download the code for this issue

In the .NET Framework, object serialization is a technology with two faces. Run-time object serialization serializes a living object to a binary stream or to a SOAP document, while XML serialization converts the public interface of an object to a particular XML schema. In spite of the similarity in appearances, run-time object serialization and XML serialization are significantly different technologies with different implementations and, more importantly, different goals. Run-time serialization is governed by .NET Framework formatter objects. XML serialization takes place under the aegis of the XmlSerializer class. XML serialization is useful to persist ADO.NET DataSet objects to the DiffGram format along with schema information. (By default, the DataSet object treats data and schema as separate entities.) Another rather interesting use of XML serialization involves web services. In fact, web services use the XmlSerializer class to encode the return value of a web method call and serialize it as XML.

Run Time and XML Serialization

Run-time serialization is made available through the classes defined in the System.Runtime.Serialization namespace. These classes provide full type fidelity, meaning that the whole structure of the target class is serialized. In this way, at deserialization time — deserialization is the reverse process of serialization — the classes can recreate a perfect clone of the original object. Run-time object serialization stores public, protected, and private fields of a class, and automatically handles circular references. A circular reference occurs when a child object references a parent object, and the parent object, in turn, also references to the child object. Serialization classes in .NET can detect these references and work them out. Run-time serialization can generate output data in multiple formats by using different made-to-measure formatter modules. The two system-provided formatters are represented by the BinaryFormatter and SoapFormatter classes. The former writes the object's state to a binary stream; the latter uses a text stream and a SOAP format.

A .NET Framework class makes itself serializable by declaring the [Serializable] attribute. The formatter uses reflection to query the class and extract relevant information to serialize. The formatter decides the layout of the serialized data, and it is also responsible for defining the "relevant information" about the class to serialize. A class can override the standard behavior of the formatters by implementing the ISerializable interface. In the default case, the class plays a rather passive role and just lets the formatter do its work. Through the ISerializable interface, instead, the class author exercises a strict control on the bits of the living object that are actually persisted.

XML serialization is similar to SOAP and binary formatters because it also persists and restores the object's state, but when you examine the way the serializer and the formatters work, you see many significant differences and can only conclude that, in spite of appearances, they are different kinds of animal. XML serialization is handled by using the XmlSerializer class, which also lets you control how objects are encoded into elements of an XML schema. Aside from goals and implementation details, the strongest difference between run-time and XML serialization is in the level of type fidelity they provide.

As mentioned, run-time object serialization guarantees full type fidelity. For this reason, binary and SOAP serialization are particularly well suited to preserving the state of an object across multiple invocations of an application. For example, .NET Framework Remoting employs run-time serialization to marshal by value objects between AppDomains.

The primary goal of XML serialization is making another application, possibly an application running on a different platform, able to consume any stored data. The emphasis is not much on the state but on the public interface of the class. For this reason, XML serialization skips over private, protected, and read-only properties. It only saves public properties and does not handle circular references. Properties that are set to NULL are ignored and type information is never serialized. All the data is normalized to a string and the transformation into a strong type is left to the component that is consuming the data.

Unlike what happens with the run-time serialization, in XML serialization the object identity — class, namespace, and assembly name — is lost. The XML serializer lets you decide about namespaces, the name of the XML element that will contain a particular property, and even whether a given property should be rendered as an attribute, text, or element. No such level of flexibility is available with the run-time serialization.

The XmlSerializer Class

The XmlSerializer class belongs to the System.Xml.Serialization namespace and exposes a couple of methods — Serialize and Deserialize. When it comes to serializing, the serializer first generates an XML schema that includes only the public properties of the target class. Based on the XML schema, the serializer generates a C# source file with a made-to-measure XML reader and writer class. The source file is compiled into a temporary assembly from which both Serialize (writer) and Deserialize (reader) methods draw for their own implementation.

The XmlSerializer class lacks a rich programming interface. It only features a few methods and events. In contrast, it counts a long list of overloaded constructors. And the constructor is the place within the class where many of the key things take place. Table 1 details the parameters supported by the various constructors.

Not all classes can be serialized to XML. The DataTable is a notable example. Try out the following code.

DataTable _table = new DataTable();
XmlSerializer _serializer;
_serializer = new 
    XmlSerializer(typeof(DataTable));
_serializer.Serialize(writer, _table);

It originates an error due to the fact that the DataTable type cannot be processed because it contains a circular reference. In particular, the DataTable contains a property (DataSet) of type DataSet and the DataSet type, in turn, contains a Tables collection that is made of DataTable objects. However, if you think it over for a moment, you'll realize also that the DataSet has the same circular reference as the DataTable; yet the DataSet is perfectly serializable through the XML serialization. What's up? The DataSet is simply handled as an exception in the source code of the XmlSerializer class. (By the way, the source of all the classes in the XML namespace is freely available from the MSDN web site in the shared-source Rotor project.)

To see the XmlSerializer class in action, let's consider a simple but representative class.

[Serializable]      
public class Person {
    public string FirstName;    
    public string LastName;
    public string[] NickNames;
}

Let's also suppose to instantiate the class as follows:

Person p = new Person();
p.FirstName = "Joe";
p.LastName = "Users";
The resulting XML code will look like this.
<Person>
  <FirstName>Joe</FirstName>
  <LastName>Users</LastName>
</Person>

Notice that to be successfully serialized to XML, a class must have a publicly accessible default constructor — the parameterless constructor. The reason is that the serializer needs to internally create an instance of the type that you specified through the constructor. Since assumptions cannot be made on the list of constructors available, the serializer attempts to use the default one. If such a constructor is not available, the XML serializer throws an exception.

Class members that evaluate to an array are rendered using a subtree in which each node renders a single array element. For example, the Nicknames property of the Person class is serialized as shown here:

<Person>
  <FirstName>Joe</FirstName>
  <LastName>Users</LastName>
  <NickNames>
    <string>WebMan</string>
    <string>BigOne</string>
  </NickNames>
</Person>

It is interesting to note that classes that must be serialized to XML cannot use most of the more commonly used collection classes. You cannot use HashTable or NameValueCollection, for example, but can employ ArrayList. This rule is due to the extra constraints set for the classes that implement the IEnumerable interface.

In particular, a class that implements IEnumerable must implement a public Add method that takes a single parameter. Just this condition filters dictionaries and hash tables out but keeps ArrayList and StringCollection objects on board. In addition, the type of the argument you pass to Add must be polymorphic with the type returned by the Current property of the underlying enumerator object.

The Deserialization Process

The deserialization process is controlled by the Deserialize method and can take place from a variety of sources including streams, XML, and text readers. Deserializing is no more complicated to accomplish than XML serialization. All that you need is demonstrated here:

StreamReader reader = 
    new StreamReader(fileName);
Person p = (Person) 
    _serializer.Deserialize(reader);
reader.Close();

During the deserialization stage, a few events can be fired — UnknownElement, UnknownAttribute, and UnknownNode. They signal when unknown and unexpected nodes are found in the XML text being deserialized. The UnknownNode event is more generic than the other two and triggers whatever node type on which the exception is detected. In case of unknown element or attribute nodes, the UnknownNode event is fired before the more specific event. The following code snippet demonstrates how to register event handlers for the aforementioned events.

XmlSerializer ser = new 
    XmlSerializer(typeof(Person));
ser.UnknownElement += new 
    XmlElementEventHandler(UnkElem); 
ser.UnknownAttribute += new 
    XmlAttributeEventHandler(UnkAttr); 
ser.UnknownNode += new 
    XmlNodeEventHandler(UnkNode);

Each event has its own event handler class and a custom event data structure. Table 2 details the properties that all the event data structures share.

The most compelling reason to use deserialization events is that they let you fix incoming data that doesn't perfectly match your target schema. For example, the sample class Person has a LastName member; the deserializer expects to find a <LastName> element in the XML source code to deserialize. If a needed element is not found, no event is ever triggered. However, if an unexpected node is found, then the user is notified.

If you know that the content of one or more unknown elements can be adapted to populate target members, then an event handler is the best place to plug your code in to do the job. For example, suppose that the node <FamilyName> contains the same information as LastName, just expressed with a different element name. The following code shows how to fix things up and have the information fill the LastName property on the target class.

void GotUnknownElement(
    object sender, XmlElementEventArgs e) {
    if (e.Element.Name == "FamilyName") {
        Person p = (Person) 
            e.ObjectBeingDeserialized;
    p.LastName = e.Element.InnerText;
}
}

You can also easily combine information coming from multiple unknown elements. In this case, though, you must figure out an application-specific way to cache crucial information across multiple invocations of the event handler. The event handler, in fact, gets invoked for each unknown node, although the event's ObjectBeingDeserialized property is cumulatively set with the results of the deserialization.

Summary

Run-time object serialization and XML serialization are different technologies. Even though they attempt to do roughly the same thing, their goals are radically different. The run-time object serialization is aimed at state maintenance and object persistence. The XML serialization is designed to persist the contents of an object and to discard any internal and hidden properties. The XML serialization has a lot to do with web services and is the underlying technology that provides for XML return values from a web method. There's much more to know about XML serialization and deserialization. In particular, you should investigate the possibilities offered by the class for serializing and deserializing a class to and out of databases. That's a pretty cool area that I promise to cover in a future article. Stay tuned!


Dino Esposito is Wintellect's ADO.NET and XML expert and is a trainer and consultant based in Rome, Italy. He is a contributing editor to MSDN Magazine, writing the "Cutting Edge" column, and is the author of several books for Microsoft Press, including Building Web Solutions with ASP.NET and Applied XML Programming for .NET. Contact him at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.