Dr. Dobb's | SCM: Continuous vs. Controlled Integration

SCM: Continuous vs. Controlled Integration

SCM tools and practices can play an important role in the transition to agile practices

January 24, 2008
URL:http://www.drdobbs.com/architecture-and-design/scm-continuous-vs-controlled-integration/205917960

Pablo Santos is a software engineer at Codice Software. He can be reached at [email protected].

The rise and consolidation of Agile methodologies has introduced a new vision and spirit in software development. Concepts such as refactoring, pair programming, and collective code ownership are common for developers worldwide.

But Agile has not only influenced the way software is analyzed, designed, and written. It also has changed the way it is assembled. Today almost everyone in the software industry has at least heard about continuous integration tools and techniques.

In this article, I analyze the pros and cons of continuous integration, and examine whether there are still opportunities to look for even more agile processes supported by Software Configuration Management (SCM) Best Practices.

Freeride Software Development

So what are the most relevant features that Agile software development brings to scene? Well, the answer to this question depends heavily on who is answering. Here I highlight some features that as a subset of agile that I call "freeride" -- trying to capture part of the spirit of agile software development:

Enforce change. Ranging from refactor to collective code ownership, the message is clear -- change whatever needs to be changed to create better code and give the right answers to customer requests.
Create a real team. Agile methods put people first, opposite the traditional project management and software engineering techniques which fiercely look for reducing staff dependency. And just as each time I mention the word "team", Tom DeMarco and Timothy Lister's Peopleware chapter on making teams jell comes to my mind, "team" being defined as a group of people working together towards a common goal, not just a bunch of programmers sitting together.
Fun. This won't ever fit in every organization, but some of them will need to compete in a global market. So getting the best out of talented individuals will be a key point. Achieving such a goal usually involves taking care of motivation, and fun is then important.

Not all organizations or projects will benefit from or be able to adopt agile techniques. Big projects with hundreds of developers and high personnel rotation aren't normally good agile candidates. In fact, the standard way to achieve agility on such environments is by splitting teams into smaller ones. And when this isn't possible, a tall hierarchy chain is required, which is incompatible with agile techniques.

But even on those environments there are certain techniques which help introduce more agile working methods. And the same techniques can benefit small agile teams to overcome many of the problems derived from some extended SCM practices.

The Role of SCM in Agile

What is the role of software configuration management in agile processes? Normally SCM is just perceived as a commodity, as a service to be used by developers. But SCM can play a key role contributing in the creation of the right environment to achieve the desired agile goals. The problem is that basically not every version control or SCM tool fits reaching those goals. Most of them fail, giving developers enough freedom to choose the best suitable process and instead force to follow the one which is closer to the tool's capabilities.

Agile is all about changing code safer, adapting to requirements faster, and listening to the customer better. And some of the most extended SCM agile practices fail giving developers the freedom to perform changes without being concerned about project's stability.

From Big Bang to Frequent Integration

Again, continuous integration is one of the core practices in agile methods. Continuous integration is the response to big bang integration (working in a silo for a long time and then putting all the pieces together at the end), which has been the root cause behind a huge number of failed and delayed projects.

Figure 1 shows a typical development cycle in which integration is done at the end of the project. With only one line of development going on, it shouldn't be a big trouble.

[Click image to view at full size]

Figure 1: Regular development process.

Problems will arrive in a real situation, like that in Figure 2. The integration is delayed until the end of the project, and then making all the code and components work together becomes a real nightmare. The problem is not only caused by the code which needs to be adjusted: personnel are not used to run integrations because they are not done on a regular basis.

[Click image to view at full size]

Figure 2: Big bang integration, big problems at the end.

So this is where continuous integration enters the scene. What if your team integrates their changes on a regular basis? Then instead of having a big problem at the end of the project, the team will have more frequent but smaller troubles, reducing the risk and making it manageable. Figure 3 depicts a frequent integration process.

[Click image to view at full size]

Figure 3: Frequent integration.

Now the question is: How frequently should I run integration processes? Once a month, once a, week or twice a day?

Non-stop Integration

Agile methods clearly enforce frequent build and release cycles, but many development groups have ended up implementing what has been called non-stop integration. What does it mean? Instead of running integrations frequently, developers integrate all the time. A developer makes a change, checks all the code in, and the build system runs all the available test suites. If the build gets broken (it doesn't compile correctly or not all the run tests pass), developers receive a warning notifying that they have to fix the problem. So, in fact integrations are now continuous because they occur all the time.

The key difference between continuous integration and the evil code-and-fix cycle seems to be the presence of a well-defined test suite, plus a firm developer's commitment to run it all the time (or enforced by build software).

But is continuous integration the solution to all version control headaches or does it introduce any problem?

In a perfect world the test suite would be almost perfect, so if it runs correctly no problem would ever occur. But in reality test suites are far from complete, and it is easy to see how a problem introduced by developers reaches the main code line immediately without being correctly checked. Once detected, it will be fixed. But in the meantime lots of developers would have been affected. Figure 4 illustrates a bug spreading scenario.

[Click image to view at full size]

Figure 4: Bug spreading and mainline instability as continuous integration aftermaths.

Imagine the following situation in which a developer finishes a given task and wants somebody from testing to check whether it is correct or not. To deliver the code, he checks it in on the version-control system, triggers the build scripts, and notifies his colleague to get the code and check whether everything is correct or not. The only reason to submit the code at that point was making it available in a managed way. If the code has a problem or doesn't implement the feature correctly, the mainline is already infected by the mistake. Because all the team members are basically doing the same, in a short period there will be a lot of code built on the wrong one.

Figure 5 shows a set of tasks being directly integrated into the mainline, as it would happen with the continuous integration working pattern. There is only one way for developers to deliver code -- merging it into the mainline. In Figure 5, after tasks 1098, 1099, 1100, and 1104 have been delivered, what would happen if task 1098 is detected as a defective one? The answer would be it has to be fixed. But, what if you need to release the code to a client or just to the testing group and you already know changes introduced by 1098 are wrong but we don't have time to fix them? Most likely features introduced by tasks 1099, 1100, and 1104 are totally independent from 1098 and they could have been properly delivered if another working pattern would have been used. Task independency happens more often after the initial phase of a project during which tasks tend to be extremely dependent on each other due to project's infancy.

[Click image to view at full size]

Figure 5: A task introducing a problem and all the rest building on top of it.

Problem	Description
Mainline instability	All the changes directly hit the mainline, so making it unstable is relatively easy. Developers always work against the latest sources instead of against well-known releases, making the whole environment more unstable.
Unnecessary bug spreading	Developers continuously update their workspaces to the newly introduced code. A bug entering the mainline will be spread to all developers in a short time. More often than not several developers will end up fixing the same bug or at least being bothered by it.
Forced task dependency	Tasks are undertaken one after another, making them depend on each other due to the development schedule, not their functional relationships. This forces the release process to be linear cutting all maneuvering possibilities and making the team unable to choose which tasks will go on a certain release.
No checkpoints	Developers only commit working code to the version control system. So developers can't enjoy the benefits of using a version control which helps them to keep track of small intermediate changes. Some shops even implement two layers of version control to give developers the same service they would get from a correct branching strategy involving private or task branches.
No real parallel development	At the end of the day the only line of development kept is the main one. No real parallel development is implemented; all what developers do is serializing their changes continuously.
Code can stay long time out of version control	If developers do fiercely follow continuous integration they will end up committing code every few hours. But more often than not this agile recommended practice won't be followed: several times a change involves many modifications or several days of work, and it doesn't make sense to integrate with others in the meantime. Developers can't commit unfinished work (as they could do with a developer or a task branch), so code is kept in developer's workstations for a long time.

Table 1: Continuous integration drawbacks

Controlled Integration

Now I don't mean that continuous integration isn't controlled. When I refer to "controlled integration" as opposed to "continuous", I mean that the first one occurs frequently, but not all the time. It normally runs when a certain milestone is reached (the milestone could perfectly be a weekly or daily planned integration).

There is also another difference between continuous and controlled that refers to the roles involved in the process: In a regular continuous scenario, all developers perform integrations and solve merge conflicts in code, which is perfectly acceptable on small, well-trained teams. But even agile teams can be affected by personnel rotation or new members joining, and it is usually not a good idea to have new developers mixing code they don't yet understand.

In controlled integration, a new role is introduced -- the integrator. The integrator can be a seasoned team member who is familiar with a big part of the code and also pretty used to the version control system and the build and release process. But the most important feature that the integrator introduces in the process is not that he knows all the code, which is not even necessary, but he takes responsibility on the integration process. Creating a new stable well-known point to serve as the base for development during the next iteration is integrator's primary goal.

Figure 6 shows an scenario in which well-defined integration points are not present. In such an environment, mainline instability and code-and-fix are likely to appear because developers are starting their tasks from volatile points.

[Click image to view at full size]

Figure 6: Development cycle with no defined integration points.

What a controlled integration process introduces is a set of well-known starting points that developers will use to work against. So, as in Figure 7, developers now will always start working against a well-known and stable baseline. For instance, between BL004 and BL005, everyone will start the new task again BL004 code, so nor unnecessary dependencies nor unstable developer working areas will affect the development process.

[Click image to view at full size]

Figure 7: Well-defined baselines in a controlled integration process.

As a side effect of controlled integration, task-oriented development can be introduced. Now each task a developer works on is handled independently by the version-control system, implementing full parallel development and giving the team a more maneuverability during release creation.

Figure 8 highlights the differences between a task-oriented parallel development and serialized development processes. When task-oriented patterns are supported by the version control, a look to the branching hierarchy reflects how the development was indeed parallel, something that can only be imagined but not traced by serial development.

[Click image to view at full size]

Figure 8: Differences between parallel and serial development.

Parallel Development and Branching Patterns

The keys which open real SCM powered parallel development are branches. Branches are normally perceived as a necessary evil by many developers, but this is normally just because many version-control tools systematically discourage branch usage. And the reason behind this is not that branches are evil, but that tools are extremely bad dealing with them.

In fact, when most of us think on branching, we consider it just from the project management perspective: You create a branch to split a project, start maintenance, or support different code variants. Use just a few branches for very specific tasks.

But branches can also be used in a more tactical way to isolate changes and create detailed traceability. Branching patterns like branch-per-task or branch-per-developer (also known as "private branches" or "workspace branches") open new possibilities in the integration process. Commits don't have to be associated with delivering anymore, they can just be used as checkpoints, creating a safety net for developers, boosting both productivity and change freedom, which are practices totally aligned with the agile goals.

Achieving task independency through branching

Using mainline development on a single branch, as it is usually encouraged by continuous integration practitioners, ends up in situations like that in Figure 9.

[Click image to view at full size]

Figure 9: Task dependency forced by construction.

What if each task were developed on each own branch? Then developers not only get an extra service from the version-control system. By creating intermediate check points when they need to instead of being forced to wait until the code is finished, they gain the possibility to actually decide what goes into a given release or not, while retaining changes under source control. Figure 10 shows an scenario where a developer switches from one branch to another, thereby avoiding unnecessary task dependency.

[Click image to view at full size]

Figure 10: Task independency achived by branching patterns.

Benefit	Description	Associated Freeride practice
Private sandboxes for developers	Developers can perform changes with more freedom, and commit them to the version control without worrying about breaking the build. Intermediate history is kept helping self-documented code.	The change flow is enforced. Developers feel free about making changes, experimenting with the code, and being able to decide later on whether the code hits the mainline or is just kept for future study.
Task independency	Tasks don't depend on each other due to the order they are carried out.	Helps creating frequent releases: whatever is not stable enough won't be integrated. Last minute integration rushes are minimized.
Real parallel development	Changes are really developed in parallel, with full traceability.	Boosts team productivity.
Mainline stability	The mainline can be much more stable than using continuous integration.	Frequent usable releases.

Table 2: Controlled integration and branching benefits.

Controlled Integration Cycle

To this point, I have introduced the concept of controlled integration, but how does it really happen? The answer is simple -- once a day, a week, or at most every couple of weeks, depending on the working volume, the stack of finished tasks gets integrated by the integrator, a new release is created, tested, and then marked as the baseline for the next iteration. Figure 11 shows the full cycle. Notice that testing (unit testing, automated GUI testing, manual checks, and so on) plays a key role in the process. If there were no tests, the cycle would make no sense.

[Click image to view at full size]

Figure 11: Controlled integration cycle.

Are there any drawbacks in the controlled integration cycle? Of course, no method is perfect, but the following ones are worth noticing:

If no build and test server is in place (something quite extended when continuous integration is present), developers run the test suites on their own workstations. This is normally time consuming and can have an impact in productivity. In case automated GUI testing is used, developer's workstations will be blocked until the tests finish.
Results are not published: Using an integration and build server there will be usually a way to publish the test results, but when they are run on developer's workstation such an option could be more difficult.

The Best of Both Worlds: Controlled + Continuous Tools

What about having the build and test tools normally used in continuous integration mixed with the controlled best practices? This way you would still get the best out of the branching patterns, the added control introduced into the process, and benefit from the build technology spread by the continuous practitioners. Figure 12 shows a mixed process. Each time a developer finishes a task, the integration server triggers a build getting the code from the associated branch. All the available tests get run and then results are published and made available to the whole team. Now developers can continue working while the tests are run, and they can get feedback after the whole test suite is run.

[Click image to view at full size]

Figure 12: Mixing controlled and continuous techniques.

Integration Alternatives

When a regular controlled integration is performed, the integrator runs a subset (smoke tests) of the complete test suite for each integrated branch. This practice lets you reject offending tasks if they break the build or don't pass the tests. The integrator is the person responsible for merging the code, running the tests, labeling the results, packing, and so on. Figure 13 illustrates the process. The problem is that the task itself can be time consuming. Normally, if the right tool is used and it implements a good merging support, the merge process is extremely fast, but running all the tests again and again will be CPU demanding.

[Click image to view at full size]

Figure 13: Centralized controlled integration.

Are there any other options to solve the problem? In Figure 14, developers integrate their branches against the mainline from their development branches, then the integration server triggers the build-and-test cycle. When a number of tasks have been integrated, the integrator checks the mainline stability and decides to create a new baseline. This approach is close to continuous integration, but has the following differences:

Developers still count on their own versioned sandboxes.
All tasks start from a well-known baselined point, which is supposed to be stable, so bug spreading is still avoided.

[Click image to view at full size]

Figure 14: Developers integrate against the mainline, integrators are in charge of the baselines.

Figure 15 introduces a variation on the same alternative that mixes controlled and continuous integration together: Developers continue integrating their changes when they finish a task, but they do it against an integration branch. The integrator is in charge of promoting the changes to the mainline when needed, also creating new baselines. Now mainline code is kept clean and contains correct and finished code.

[Click image to view at full size]

Figure 15: Controlled integration + integration branch.

Conclusion

SCM tools and practices can play an important role in the transition to agile practices and enhancing the current ones. Both small and large teams can benefit from better isolation, task independency, and better release assembly.

Isolating tasks and changes in branches introduces an added layer of security and traceability, pushing the freedom to perform changes and incrementing both stability and productivity.

The right choice heavily depends on the organizational situation, but deploying version-control systems which are agile dealing with branches gives the development group the freedom to choose the right pattern for the right stage on the project's lifecycle.