Distributed Change at an Item Level
In ownership-based multi-sites, an item (file or directory) can't be changed when the site hasn't the branch ownership. Okay, but why?
Figure 5 shows the evolution of file foo.c at two remote sites.
- Step 1 shows how the file has been replicated and it exists on both sites, on the same branch.
- Then on step 2, two different developers working at the different sites decide to make modifications to the foo.c file.
They both start at the same revision, revision number 2, and make a different number of changes to fix different bugs.
The developer at Site A creates revisions 3 and 4, while the one at Site B creates another revision. (Note that revision 3 at Site A is not the same as revision 3 as Site B. To keep track of these situations, distributed systems normally identify revisions based on some sort of globally unique identifier).
If only one developer at a time performs a change, the replication system would just copy the new revision from the source into the destination, and no conflict will happen. But on the depicted situation, if the new revision from Site B is directly copied into the main branch at Site A, the changes made after the revision 2 would be lost, and the same would happen if the replication is run from Site A to Site B.
To prevent this situation, mastership or ownership-based replication systems don't let users make simultaneous changes on the same revisions on the same branches.
True distributed systems -- those supporting both the full replication and distributed development categories -- implement ways to handle these conflicts. Hence, they support a broader set of development alternatives.
Concurrent Change on Distributed Systems
Let's continue examining an item-based example to understand how distributed systems can manage concurrent changes at different locations and their later reconciliation or merge.
The process I describe primarily focuses on Plastic from Codice Software because, as one of its developers, I'm most familiar with its concepts. But it is important to note that other SCMs supporting concurrent distributed changes implement very similar techniques. They can vary on the exact terminology or strategy, but basically share the same principles.
Figure 6 illustrates how the Site B on the previous example can handle a replication coming from Site A when both locations have modified the same file on the same branch.
The revisions 3 and 4 from Site A are pushed into Site B but instead of being directly plugged on the main branch under revision 2 (which is the parent of revision 3), and they're located on a new branch. If revision 3 on Site B didn't exist, the two revisions would be directly plugged after revision 2, and no fetch branch would be created.
Although the pushed revisions have been re-branched (located at a different branch than their original one), they still preserve their original history as they are linked with their corresponding parent (revision 2 in this case).
Distributed systems must also correctly preserve the merge tracking information to guarantee replicated content can be correctly merged.
Once the revisions from Site A have been replicated into Site B and placed into their new fetch branch, a regular merge can happen between revision 4, coming from Site A, and revision 3 at Site B. Because revision parenthood and merge-tracking is preserved, a three-way merge including correct common ancestor calculation can happen, ensuring the merge between the local and replicated revisions is right.
Basically it can be stated that using some sort of fetch branch (or changeset, depending on the system), the distributed branch and merge problem is reduced into a local one, already supported by a number of systems.
The last step would be fetching from Site B, once the merge has been done, into Site A again to get the changes from B into A. Figure 8 details the process.
Note how the merge link which was created on Site B to merge changes is now replicated into A and correctly located linking the right revisions, to ensure further merges and project evolution is correct. Revision 5 at Site A is exactly the same as revision 4 at Site B. The history of both sites (repositories) is equivalent although not identical.
If mastership-based replication is used, then it can be ensured that the replicated repositories are identical. When concurrent changes are permitted, repositories will end up being equivalent -- although not exact -- copies.
The techniques to manage revision relinking after replication vary from system to system. Plastic and Mercurial, for instance, track revision history, so it happens that revision 3 on branch main on a repository is not the same as revision 3 at the same branch on another replicated repository, as the previous examples showed. Systems like GIT which don't preserve the exact revision history (it can be mutated although the right contents are preserved) don't care about this renumbering and only identify revisions by their internal globally unique identifiers ("hashes" in the case of GIT) which are correct, but harder to read.