In the context of databasesand object databases, in particularreplication is the ability to duplicate an object from one database (call it "database A") into another ("database B"). The duplication is performed in such a way that both object instances can be modified, so that at a later time the duplicated object can be returned to its original database, and any differences between the object in A and the object in B can be resolved. There are variations on this scenario, but that's the basic idea.
Users of mobile devices might recognize this as "synchronization." In such applications, the desktop system maintains a set of "master" databases that are replicated onto the handheld device. In the process of using the handheld, you add phone numbers, delete appointments, create new shopping lists, and so on. And, at some future point, you synchronize the mobile device with the desktop, which records the changes you've made into the master database. In short, replication allows a subset of information to be drawn from one database into another, and permits the user to operate on that subset in a disconnected fashion.
Suppose you want to implement replication. What capabilities do database systems need to provide for error-freeor, at least, as close to error-free as we could reasonably getreconciliation of data modified by disconnected users? (In this article, I assume that the databases employed are object databases. So, when I speak of "data," I speak of "objects.")
Certainly, the first requirement would be some way of identifying an object, even as clones of it are replicated from one database to another. In other words, when you put an object into the database, a unique identifier must be attached to it. That identifier must be bound to the object in such a way that, when the object is replicated into another database, the identifier goes with it. (You cannot count on the object's data content as the sole means of identifying that object.) Ideally, this identifier will be invisible and unmodifiable, except under unusual circumstances. After all, there is no reason to make its presence and value known to anything other than whatever framework handles the replication process.
Furthermore, this identifier must be universally unique. Suppose you begin with database A. You replicate a subset of its data into database B. Then you replicate data from database A into database C (some of the objects in C may also be objects in B). Suppose further that, after replication, object X was created in B, and object Y was created in C, and that both objects are of the same class. We must be guaranteed that the unique identifiers of X and Y are universally unique, even though databases B and C (and, in fact, A as well) are disconnected. If, by some chance, object X and object Y received the same IDs, then it would be difficult to impossible to reconcile both databases back to the master database.
Accurate reconciliation of disconnected databases also demands that some sort of version information be attached to an object when the object is modified. This is necessary so that synchronization can determine which of two replicated objects is the most up to date. Ideally, this would be information along the lines of "this object was modified at date and time xxx." At the least, we need a modification flag, such as the one on the Palm OS, which indicates when a record has been made dirty. Otherwise, the synchronization application will have to examine each and every object's content (that is, each object passing through synchronization), comparing it with the original (the object in the master database), to see if any changes have occurred, and attempt to deduce from those changes which object is older. Such resolution code would be unacceptably complex and slow.
Finally, a synchronization system must provide a mechanism that allows the developer to manage conflict resolution. Under normal circumstances, when an object that has been replicated is being reconciled back to the main database, the replication framework code can examine version information (assuming it exists) and determine whether the master database object or the replicated object is the most up to date. Conflicts occur when the version information is such that the winner is not apparent. The developer must be able to provide code that can resolve the confusion.