Object serialization is the process that renders the current state of an object to a format that can be persisted to storage medium files, streams, memory. Associated with serialization is the deserialization phase, which converts a previously serialized stream of data back to an instance of the original object. Put together, serialization and deserialization provide a way of storing and transporting data. The concept behind object serialization is quite general and not specifically related to a particular platform. Each serious development platform from MFC to the .NET Framework provides a built-in mechanism for persisting the state of a class.
The .NET Framework offers two different programming interfaces for serialization of objects: data formatters and the XML serializer. In the July and August 2003 installments of this column, I extensively covered the XML serializer. In this article, I'll focus on data formatters.
Ways to Serialize
A data formatter is the .NET Framework component in charge of serializing living instances of classes to a variety of media including streams, disk files, memory, network shares, and other AppDomains. The .NET Framework comes with two predefined formatter classes BinaryFormatter and SoapFormatter. The former serializes the state of the object as a binary object; the latter translates the state of the object to a SOAP packet. In spite of this radical difference, the programming interface of the two classes is nearly identical. In this article, I'll focus on the BinaryFormatter class.
The main goal of formatters is preserving type fidelity. All properties of an object are serialized regardless of their private or protected scope. This feature is fundamental to preserve the state of an object across different invocations of an application. The serialization process doesn't persist the code behind a class and no runnable code is ever persisted or transferred. The serialization process takes a snapshot of the internal state of the object and converts it to a format that can be stored or transported. In doing so, data formatters consider all data members that make up the object private, protected, and public properties. In addition, formatters know how to manage circular references between types. A circular reference is a situation that occurs when an object of, say, type T1 has a property of, say, type T2; at the same time, T2 has a property that references T1. For example, let's consider the ADO.NET DataSet type. The DataSet has a Tables property, which is a collection of DataTable objects. But the DataTable class has a DataSet property whose type is just DataSet. So between the DataSet and the DataTable classes, there's a circular reference that must be successfully handled in order to serialize both classes.
The biggest difference between data formatters and the XML serializer lies just here the XML serializer doesn't guarantee type fidelity. The XML serializer doesn't handle circular references and persists only the public properties of an object. As a result, XML serialization should not be considered as an alternative type of serialization but instead as an additional, lightweight tool to be used when object data serialization is what you need and not full object state serialization.
So, which serialization mechanism is more appropriate for you? The answer to this question is connected to the answer you give to another question: Why would you want to use serialization? If your goal is persisting the state of an object to a storage medium so that an exact copy of it can be recreated at a later time, then formatters are for you. For example, the .NET Framework employs the binary formatter to persist the state of an ASP.NET session to an external medium such as SQL Server and the ASP.NET state server. Likewise, the .NET Framework resorts to formatters to copy objects to the clipboard in Windows Forms applications. In all these cases, the persisted object must be recreated as a clone later.
XML serialization is primarily used by web services methods to return values and is recommended to programmers for all those cases in which what really matters is the set of public properties of the type. XML serialization (more exactly, its deserialization engine) is useful to parse incoming XML data and map values to classes. I demonstrated this in my August 2003 column.
The way in which data formatters work depends on the characteristics of the class being serialized. To be serializable, a class must be marked with the [Serializable] attribute. This attribute is mandatory. If the attribute is missing and you attempt to serialize an instance of the class, an exception is thrown. By default, data formatters decide which members are to be serialized and how. A class that wants to play an active role during serialization, and decide itself which members are to be serialized and in which format, can do that by implementing the ISerializable interface.
In general, the process of serialization can be automated to a large extent, and so it is in the default case. Suppose you have a class like the following:
public class Employee
public string LastName;
public string FirstName;
To serialize the state of an instance of the Employee class, you need surprisingly simple code:
Employee emp = new Employee();
emp.LastName = "Davolio";
emp.FirstName = "Nancy";
fs = new FileStream("emp.dat",
bin = new BinaryFormatter();
In this code snippet, the effective serialization call takes only one line and consists of calling the Serialize method of the BinaryFormatter class. The state of that instance of the Employee class is successfully persisted to a file, but nothing in the code or in the class itself tells the formatter how to map properties to bytes. To deserialize from the stream, you use the following, simple code:
emp = (Employee) bin.Deserialize(fs);
The serialized stream contains various information including the full name of the assembly that contains the class being serialized, the name of the class, and name and value of each serializable property. If you serialize with an application
and deserialize with another, make sure that the definition of the class is placed in a third assembly that both applications link to. Here's a scenario that is a common source of troubles: Suppose you write a quick console application to test the binary formatter. To keep it as simple as possible, you reasonably define the Employee class in the same file as the main program. As a result, the serialized data inserts the name of the console application assembly as the name of the assembly that contains the class. Not too bad so far.
Suppose now that you write another simple console program to deserialize from the same file created by the serializer applet. To complete the deserialization, you need to reference the original type in the deserializer applet. Once again, to keep it simple you decide to paste the definition of the Employee class to the source code of the deserializer application.
As a result, an invalid cast exception is thrown when you attempt to cast the output of the Deserialize method to the Employee class. The type of the object returned by the deserializer is nominally correct it is named Employee but the cast still doesn't work. What's going on?
The deserializer returns an instance of the correct type the one defined in the serializer applet. The rub lies in the fact that you attempt to cast it to the Employee type defined in the deserializer assembly. The name, the namespace, and the definition of the types are the same, but since they live in different assemblies, they are actually different types, and the cast doesn't work! If serialization and deserialization of a custom type occurs in distinct executables, the best thing you can do is to define the custom type in its own assembly and link this assembly to both executables. (If you don't do this, you have to use reflection or VB.NET late-bound code to access the deserialized object.)
As you can see, in a large number of situations the serialization process doesn't need any special care to work. The formatter resorts to reflection to determine in an automatic way which properties are to be serialized. In particular, the Serialize method calls into a static method of the FormatterServices class to get the list of the serializable properties. The method is GetSerializableMembers. This method uses reflection to query for all properties of a type and verifies that the [NonSerialized] attribute is not specified.
If for some reason you need for some members to not be serialized, you can mark those properties with the [NonSerialized] attribute:
public class Employee
public int ID;
public string LastName;
public string FirstName;
Class members that contain security sensitive data should never be serialized to avoid exposure of critical data to all the code that holds serialization rights. In addition, members whose value can easily be recalculated at runtime (i.e., thread ID, current date) should not be serialized.
Step by Step
When the Serialize method is called, the formatter proceeds step by step and performs a number of checks along the way. The first stop is to verify whether the particular instance of the formatter has a surrogate selector. A surrogate selector is a class that takes over the serialization and deserialization of another object. By writing a surrogate selector for a given type, you actually subclass the way in which it gets processed. The good thing about surrogates is that the surrogate code is not part of the class being serialized. Hence, the surrogate mechanism allows you to serialize classes not originally designed for serialization, classes for which you don't have the source code. It also allows you to deserialize a certain type to another type for example, a newer version of the same class.
A surrogate selector class is a class that implements the ISurrogateSelector interface. You register a surrogate for the Employee type using the following code:
// emp is Employee
SurrogateSelector ss = new SurrogateSelector();
MySurrogate sur = new MySurrogate();
formatter.SurrogateSelector = ss;
The second step performed by the serializer is seeing whether the object is marked with the Serializable attribute. If not, a SerializationException is thrown. Finally, the formatter checks whether the object implements the ISerializable interface. If not, the default serialization policy is used and all fields not marked as NonSerialized are persisted to the output stream.
During the deserialization process, the formatter first checks for a surrogate selector that handles that type. If a surrogate exists, the methods on the ISurrogateSelector interface are invoked. Otherwise, the formatter allocates enough memory to hold an instance of the target type and copies the values of all serialized members. At this point, if the target type implements the IDeserializationCallback interface, some methods are called to give the class a chance to execute some code upon the completion of the deserialization process. For example, a class that has a few members marked as NonSerialized should take advantage of the IDeserializationCallback interface to complete the initialization process.
You learned so far that object serialization can take place by default, leaving the formatter responsible for the members to persist, or selectively, when the class explicitly marks some of its members as nonserializable. In both cases, though, there's not much else the class can do to control the process. The overall role of the class is quite passive. The class governs the process only when it implements the ISerializable interface. The interface features only one method GetObjectData defined as follows:
The role of this method is critical to the health of the serialization process. When the class implements a custom serialization algorithm, the formatter is no longer responsible for the data being serialized. This is a double-edged sword, though. The class has a great chance to optimize the way in which it is serialized, but this strength can easily turn into the weakest feature when a suboptimal serialization format is chosen. Before we look into a concrete (and illustrative) example of this bad practice, let's review the steps of custom serialization.
When the ISerializable interface is detected on the target class, the formatter creates an empty instance of the SerializationInfo class a sort of collection and passes it to the GetObjectData method of the interface. The class adds property values to this pseudocollection using the AddValue method, as shown here:
When the GetObjectData method returns, the formatter flushes the current content of the SerializationInfo object to the underlying stream. In this way, the code of the class is the sole culprit if something eventually goes wrong.
The DataSet and the DataTable are the only two ADO.NET objects
to support serialization. Both classes implement the ISerializable interface
meaning that both classes directly control their serialized content. The problem
with this lies in the fact that both classes serialize themselves as a DiffGram
that is a quite verbose XML schema that represents the current state
of the object. You should understand immediately the real scope of this choice.
Whenever you transfer a DataSet across the tiers of a distributed application
(i.e., use .NET Remoting), a potentially large XML document gets transferred,
no matter that you managed to obtain a binary serialization. Even if you (or
the system) use the BinaryFormatter, there's no guarantee that a binary
block of data is actually created. Classes that implement the ISerializable
interface are responsible for the content; if this content is verbose XML, then
there's nothing the formatter can do to alleviate the issue. Microsoft is seriously
considering changing the serialization format for these critical ADO.NET objects
in the next version of the framework.
Dino Esposito is Wintellect's ADO.NET and XML expert and is a trainer and consultant based in Rome, Italy. He is a contributing editor to MSDN Magazine, writing the "Cutting Edge" column, and is the author of several books for Microsoft Press, including Building Web Solutions with ASP.NET and Applied XML Programming for .NET. Contact him at [email protected]