The Centralization Lock-in
We live on a globalized world and the impact is clear for both small and big development shops all around the globe. Having remote individuals or entire teams is a extremely common situation these days.
It has advantages like being able to hire talent globally, enabling more flexibility for personnel and even having teams closer to the final customer. It also has drawbacks in the form of communication issues.
Just to give an example, let's focus on agile development practices like the nowadays all-around SCRUM methodology and how it defines teams: sitting together and collaborating. Then you can find entire chapters on books (I'm thinking on latest Cohn's book Succeeding with Agile, http://www.succeedingwithagile.com, which is not my favorite from the author's collection but it's the one I'm reading right now) devoted to resolve the challenges that you face when not all the team can be at the same room for the daily scrum meeting. Having ambassadors, inviting one of the members by web meeting or videoconference, all the team when the total number is not bigger than 10…
My area is SCM and there are a few concerns there, too, most of them strictly related to the same family of issues you can find with agile practices and all the other aspects of software development.
I find teams with, let's say, three or four facilities and a good number of developers on each of them, all working against a central repository through their VPN connections. When they're in trouble they share same features:
- High head count: from 50 to more than 150 developers committing changes to the same repo or set of centralized repositories.
- Long-lived project: years of evolution on the same code base.
- Remote sites locked by slow access to the central location. Yes, we're in 2010 and remote access should be a snap, but reality is, let's face it, that slow and unreliable connections are used all around, making LAN operations hell over the WAN.
- High change ratio: fixes, patches, new features and customizations flying around on a daily basis, hitting three or four (or even more!) integration branches.
- Stability issues on the code base.
- Strong need for parallel development to support their current demands from customers.
There are basically two out-of-the-box solutions to solve the previous issues (implementing them is obviously not that straightforward and will need involvement with the organization, agreeing on a simple but yet effective working procedure and so on, but for the sake of simplicity let's focus on the core practices) which are: implementing strong parallel development using best of breed branching techniques and enabling distributed (or multi-site if you prefer) support for the remote teams.
And here's where I tend to find the centralization lock-in problem, which is stronger as the teams grow: "yes, I understand remote locations shouldn't be waiting on slow connections or restricted by the network bandwidth on daily operations but we can't afford having our code, our most valuable asset, being spread on 4 different locations, it will be hell to maintain" — it's a pretty common concern I hear.
Of course, if you mention distributed development the reaction can be even stronger on corporate sites: "having all codebase spread over a hundred of computers sitting on each developer's private repository??!! We can't handle that. Look, we've very junior people, a different range of skill-sets and not everyone will be able to handle that, and it will end on up with a total mess." Fortunately this one can be easily solved as soon as you explain that the distributed set up can be customized to have one server per location instead of one per developer (which I must agree can only succeed when you count on highly skilled teams who really understand what they're doing with their SCM tool). Problem is they jump to rejection number 1 just after this one is solved.
An image is worth a thousand words so let's give it a try. Here's how the initial scenario looks like:
All changes flow from the remote and local sites to the central server, which will handle the entire load. Remote sites will be limited by the network bandwidth and its availability. Remote sites will suffer from disconnections and low performance.
In terms of what's going on inside the repository, the following image shows how most changes will go into the trunk coming from all sites. It means that it's mandatory to have some sort of continuous integration process in place to prevent the build to get broken, but unfortunately the procedure is reactive so it will be broken and fixed, creating intermediate periods of instability that can spread issues among all developers.
The process is sequential, totally linear on each branch and every developer is forced to merge his changes independently of his own skills set (junior members who still don't share the whole project vision are integrating potentially critical code on the main branch themselves).
As soon as the term "multi-site" or "distributed" is introduced into the conversation, the views that most of the people get are the following: initially the multi-site scenario, in which sites collaborate together through their own site servers that capture all changes and exchange them with the central one. It's a very common scenario and it's especially suitable for the corporate environments.
The second option, which sometimes turns on all the alarms on the corporate teams is the "fully distributed" scenario, which is most of the time perceived as a chaotic one (despite of the overwhelming success it has on open source projects, but remember the mindset, motivation and willingness on such projects are sometimes radically different from what you get on companies). The figure depicts a hierarchically distributed system that is hiding some of the potential complexity since the nodes could technically exchange commits and branches between them, too (but if the scenario is already too much for some organizations, displaying the full number of arrows will only make it worse :-P).
Why do companies tend to stay on the centralization lock-in? Simply check the following diagram: centralized is perceived as order, while distributed is perceived as chaos. Using a single central server can be associated with having all the valuable assets under a safe, properly protected, backed up, controlled, secured.
But, having each developer (the extreme case) manage his own repository copy seems to be like losing control over the valuable codebases. Who's in charge? Too many points of failure, maybe?
Communication paths also help understanding the concern: it's very easy to understand that on a centralized server setup there are as many communication channels as clients, period. Each client connects to the server and that's all. But if each client is running its own repository… then the number of communication channels grows exponentially, and we're all aware, thanks to the Mythical Man Month book that complexity only raises when communication channels do it!!!
Let's take back control?
Each site is willing to escape from the central control and have enough freedom to run their own servers and hence remove frustration from developers who have to wait for the network link to complete operations. It can boost productivity simply by removing a painful activity.
But central operations refuse to do so because then control will be lost. Having replicated repositories will increase the chances of people doing concurrent modifications on the same codebase, right? And all difficult to control… Good! It sounds crazy!
(Note: think about it — you won't actually have more conflicts than what find on a central branch with all developers working on it at the same time, but entering the distributed variable obviously makes the whole equation harder to grasp at first sight.)
So, there's a horrible solution introduced by centralized control freaks: "hey, let's introduce back locking checkouts so that while site B is modifying something, the other sites are prevented to make modifications hence returning back linear development to the picture". .Sound familiar? It does for me… unfortunately!
Locking changes, preventing others to evolve is one of the worst ideas to overcome in the SCM arena. Introducing it back on distributed scenarios only cuts the benefits we tried to introduce with the whole model: what if a site goes down while it holds a lock? What will happen to the other sites? Will they get stopped?
Remember the current paradigm is to embrace change not to preclude change, so the solution lies on parallel development and not on making it linear again. Simplifying the problem is the solution, not making it even more complicated.
Parallel development to the rescue
Let's take another look at how your current repository is evolving since it will greatly help understanding how to arrange the distributed set up.
This is how changes are getting introduced on your main branch using serial development. Every commit is a new chance to break the build, developers don't get real isolation and changes come from all around… continuously. Yes, under these circumstances multiplying it by several remote sites easily becomes a real nightmare.
But, isn't there a better way?
Yes, there must be a better way. Using a single branch for committing changes (ok, or one per release) is the technology everyone was using in the nineties. Remember the nineties? Visual Basic 3 was used then. Do you still use the same techniques and tools for everything else? I bet you don't!
Isn't there a better way to handle changes? A better unit of change? Yes, it does exist and you've been using it during your entire career: branches.
- Branches? - I hear you say.
- Yes, branches. Ok, call then streams if you want to sound cool, but yes, branches. Stop using branches for only one purpose, use them for everything!
- But, hey, aren't branches supposed to be evil?
- Yes, as evil as go-to sentences to handle exit conditions before they become try/catch blocks. (Ok, now I'm joking but you get the point).
So, have you heard of topic branches? I still prefer to call them task branches but topic branches are a good name too.
A topic branch is a branch you use to contain a group of tightly related changes you make to complete a bug fix, a new feature, an experiment or an optimization… whatever change you're working on. They're short-lived and they'll be merged back to its parent quite soon (remember whatever good practice you're using, it will end up telling you a task shouldn't last longer than a couple of days if you want to keep it under control and preserve visibility. This is what SCRUM says about the tasks you create while decomposing user stories so, yes, a task is mapped to a branch!!).
Why use branches instead of changesets as a unit of change? Well, because one changeset is on set of changes, one commit, but you can (and you'll do) make more than one change on a file in order to finish a modification, maybe because you want to try something, check if it works, keep the result and iterate again until finished. Each of these intermediate checkpoints can be stored on a branch, in fact must be stored on a branch helping the whole thing to get self-documented and easier to understand while reviewing.
How does it look like? Check the next figure. It will sound very familiar but just a little bit more structured than before.
And, as a programmer, I'm sure you'll appreciate structure.
Now, changes come to the trunk on a more controlled, organized and hierarchical way. There's a chance to keep your mainline pristine, to avoid breaking the builds, to review your code before it hits the main-line (and still getting it shared among all the teams), to use your version control as a tool and not just as a delivery mechanism (you can now have intermediate commits, as many as you want to). Basically it's just a new approach.
Jump back to the distributed world
Well, now look back at the previous situation but redrawn on the figure below. What if the branches with the blue border came from a remote site instead of the local one? Will it greatly change the picture? You could still have your integrator merging them all on-site. You could still peer-review or formally inspect all changes, you could still run tests on every branch before merging back, and you could still run your entire test suite after finishing a release on the trunk branch.
Sound easy, doesn't it? Exactly the same picture, not having to worry where changes come from since you can get the full story of each of them, prevent broken builds, take care of every single step, but yet doing it in a distributed manner.
This is why the centralized lock must be broken. You can have remote teams working on topic branches all day, having some roaming developers ironing out an important optimization on the customer site and your rock-star programmer on parental leave contributing some great code while working at home, and none of them will be blocked by each other, and still all changes will come back under control of your central site in an organized way.
One possible pattern is to keep master branches are clones and then perform the entire integration at one of the sites. This way the distributed conflict nightmare introduced above gests automatically removed.
Of course in order to overload the master team with the entire merge burden, integration branches could be created (if needed) at each site. And still dealing with concurrent changes on trunk is entirely possible with most modern SCM tools, avoiding extra complexity when the underlying mechanism is correctly understood.