Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Distributed Software Development Explained


Distributed Change at an Item Level

In ownership-based multi-sites, an item (file or directory) can't be changed when the site hasn't the branch ownership. Okay, but why?

Figure 5 shows the evolution of file foo.c at two remote sites.

  1. Step 1 shows how the file has been replicated and it exists on both sites, on the same branch.
  2. Then on step 2, two different developers working at the different sites decide to make modifications to the foo.c file.

They both start at the same revision, revision number 2, and make a different number of changes to fix different bugs.

The developer at Site A creates revisions 3 and 4, while the one at Site B creates another revision. (Note that revision 3 at Site A is not the same as revision 3 as Site B. To keep track of these situations, distributed systems normally identify revisions based on some sort of globally unique identifier).

[Click image to view at full size]
Figure 5: Ownership-based multi-site. File level study.

If only one developer at a time performs a change, the replication system would just copy the new revision from the source into the destination, and no conflict will happen. But on the depicted situation, if the new revision from Site B is directly copied into the main branch at Site A, the changes made after the revision 2 would be lost, and the same would happen if the replication is run from Site A to Site B.

To prevent this situation, mastership or ownership-based replication systems don't let users make simultaneous changes on the same revisions on the same branches.

True distributed systems -- those supporting both the full replication and distributed development categories -- implement ways to handle these conflicts. Hence, they support a broader set of development alternatives.

Concurrent Change on Distributed Systems

Let's continue examining an item-based example to understand how distributed systems can manage concurrent changes at different locations and their later reconciliation or merge.

The process I describe primarily focuses on Plastic from Codice Software because, as one of its developers, I'm most familiar with its concepts. But it is important to note that other SCMs supporting concurrent distributed changes implement very similar techniques. They can vary on the exact terminology or strategy, but basically share the same principles.

Figure 6 illustrates how the Site B on the previous example can handle a replication coming from Site A when both locations have modified the same file on the same branch.

[Click image to view at full size]
Figure 6: Concurrent change on DSD.

The revisions 3 and 4 from Site A are pushed into Site B but instead of being directly plugged on the main branch under revision 2 (which is the parent of revision 3), and they're located on a new branch. If revision 3 on Site B didn't exist, the two revisions would be directly plugged after revision 2, and no fetch branch would be created.

Although the pushed revisions have been re-branched (located at a different branch than their original one), they still preserve their original history as they are linked with their corresponding parent (revision 2 in this case).

Distributed systems must also correctly preserve the merge tracking information to guarantee replicated content can be correctly merged.

[Click image to view at full size]
Figure 7: Ownership-based multi-site. File level study, merging.

Once the revisions from Site A have been replicated into Site B and placed into their new fetch branch, a regular merge can happen between revision 4, coming from Site A, and revision 3 at Site B. Because revision parenthood and merge-tracking is preserved, a three-way merge including correct common ancestor calculation can happen, ensuring the merge between the local and replicated revisions is right.

Basically it can be stated that using some sort of fetch branch (or changeset, depending on the system), the distributed branch and merge problem is reduced into a local one, already supported by a number of systems.

The last step would be fetching from Site B, once the merge has been done, into Site A again to get the changes from B into A. Figure 8 details the process.

[Click image to view at full size]
Figure 8: Ownership-based multi-site. File level study, merging.

Note how the merge link which was created on Site B to merge changes is now replicated into A and correctly located linking the right revisions, to ensure further merges and project evolution is correct. Revision 5 at Site A is exactly the same as revision 4 at Site B. The history of both sites (repositories) is equivalent although not identical.

If mastership-based replication is used, then it can be ensured that the replicated repositories are identical. When concurrent changes are permitted, repositories will end up being equivalent -- although not exact -- copies.

The techniques to manage revision relinking after replication vary from system to system. Plastic and Mercurial, for instance, track revision history, so it happens that revision 3 on branch main on a repository is not the same as revision 3 at the same branch on another replicated repository, as the previous examples showed. Systems like GIT which don't preserve the exact revision history (it can be mutated although the right contents are preserved) don't care about this renumbering and only identify revisions by their internal globally unique identifiers ("hashes" in the case of GIT) which are correct, but harder to read.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.