Parallel

Distributed Software Development Explained

By Pablo Santos, July 03, 2008

Distributed Software Development introduces a huge new world of possibilities on top of well-known SCM best practices

Concurrent Multi-site Development at Branch Level

Let's zoom out again to the branch level and take a look at how branches evolve in the scenarios described so far. When focused on project evolution, it's easier to understand what happened when looking from a branch perspective than a file or directory one.

[Click image to view at full size]

Figure 9: Concurrent multi-site development at branch level.

Figure 9 shows how the two repositories at the two locations evolve in parallel. Changes can be performed at the same time at the two locations, even on the same branch.

From time to time (when a file or directory is modified at the same time on the two locations) a fetch branch is created to solve the merging conflicts.

With concurrent multi-site support or full multi-site, it is easier to implement almost any branching and organizational pattern, especially continuous integration ones. Mastership-based replication makes things a bit more complicated due to the restrictions it imposes.

Distributed Branch Per Task

It's no secret that branch per task is my favorite branching pattern. It enables real parallel development, provides great isolation and added services to developers (like committing changes really often without ever breaking the build), and it ensures mainline is always clean and stable.

Of course, branch per task is not everyone's favorite but I bet it will be. Take a look at widely used systems like Subversion which, among others, is responsible of the training of a huge number of developers in the SCM basics. Since Subversion has taken hold at universities, almost every new software engineer around the world knows how to use it.

But Subversion used to have big troubles with branching (particularly merging), and, in my opinion, that's one of the huge reasons why a big number of developers don't like branching at all. ("Subversion can't do it well, so I don't like it.")

But branching patterns, including branch per task, introduce a number of opportunities to push development into the next level. Real parallel development is only possible through proper branching schemes (or "streams", if you prefer a fancier name), so being afraid of branching is not good for software development. Of course I've traced the reasons of the problem to Subversion, but probably CVS is the original root of the problem.

Fortunately things are changing. With Subversion 1.5, branching and merging has been greatly improved, and I suspect that in a few years we'll see a move towards branching just because of Subversion. In fact, a number of recent online tutorials and webminars have introduced the basics of an initial branch per task support with Subversion.

Branches are not always a way to split development or to fork a codeline. There's much more about branching than that.

In branch-per-task, you associate each issue in your preferred bug, issue, or project management system with a branch. Yes, branches become the greatest and more powerful change containers ever.

Simply put: A new task, a new branch. So simple. Of course you need a tool which lets you easily create branches, track their history, and even evolution (and here we enter the field of streams, although I still prefer to keep the same names and call them just "smart branches").

There are other association mechanisms for changes like changelists (also implemented by Subversion 1.5 and present for years in award-winning tools such as Perforce).

The limits of changelists are clear: They normally live only on the client side and they can only contain on revision of each item (file or directory). And what's worse, they're not independent: You modify a file and fix a bug and associate it to a changelist which is in turn associated to a task. Then you jump to the next task and create a new changelist starting at the revisions you've just created. Okay, it's better than nothing but -- aren't branches better? Yes, they're better because branch-per-task supports the concept of stable baselines: Whatever change you make you always start from the last stable baseline instead of your latest changes. Consider the following example:

Issue 10101: fix a bug in the data layer of your rocket launch system. It can only take a few minutes but its impact can be great.
Issue 10102: change the launch button color from yellow to red. It is simple and risk-free.

If you implement 10101 first, you'll be touching some critical code. Then you implement 10102 on the next changelist, on the same branch (trunk maybe?). Yes, you run all your tests but as far as I know, no test suite is perfect and then you are pressed to release a new version. You and your team are confident about the button color change, but not so sure about the dangerous change in the datalayer. Yes, it is passing the tests but can you afford skipping it until it is more tested internally? The answer is "yes," but because you already implemented 10102 after it -- both tasks are linked! You'd have to get rid of the changes of 10101 probably running some sort of substractive merge and it will be easy unless 10101 is associated to more than one checkin with other unrelated checkins done in the meantime... if that happens... it will take longer to get rid of the changes.

Now think about branches: 10101 is in a branch (what about "rocket10101" as its name?) and 10102 is in another branch. You need to release the new version, then you just decide to merge 10102 and the other risk free tasks inside the trunk (or main branch). Easy, clean, and traceability happy.

Okay, now that you all see that branch per task is the way to go, let's move to the distributed scenario. How would you handle branch-per-task in a DSD environment?

Well, it will be even easier than the single-branch distributed development and it could be even supported to some extent by mastership-based replication systems (yes, unfortunately all the proxy-based systems are out of the game when true replication comes to place).

Figure 10 shows how a stable release has been created at Site A and replicated to Site B.

Then developers at the two sites start creating branches in parallel and making changes. Because the changes are isolated on branches, mastership-based replication systems can still play the game. Of course, the purest scenario won't always happen, and even if developers isolate the tasks in branches, they can be working together on the same task branch at more than one site, which is a very good idea.

[Click image to view at full size]

Figure 10: Distributed branch per task.

At a certain point in time a new release has to be created. The release building team can be located at Site A (there are several possibilities here -- the two sites could be running continuous integration at their respective sites, even combined with branch per task, which is a very good practice merging the best of the two worlds, or more than one build team at different sites, and so on, but the basics will be the same) will then fetch the finished tasks (branches) from Site B and create a new release.

Once the release is fully tested, it will be replicated back to Site B, and a new iteration (or SPRINT, if you are familiar with SCRUM and agile methods) will start.

[Click image to view at full size]

Figure 11: Distributed branch per task.

Conclusion

Distributed Software Development introduces a huge new world of possibilities on top of well-known SCM best practices. The ability to run development in parallel at different sites in an easy way, and understanding the underlying concepts, will greatly help improving the existing way of working of multi-site teams and companies, opening new doors to improved and more productive ways of working.

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Distributed Software Development Explained