Dr. Dobb's | Multi-Stage Continuous Integration

Multi-Stage Continuous Integration

Multi-Stage Continuous Integration allows for a high degree of integration to occur in parallel

December 02, 2008
URL:http://www.drdobbs.com/architecture-and-design/multi-stage-continuous-integration/212201506

Damon Poole is Founder and CTO of AccuRev, a provider of process-centric software solutions to improve Agile, geographically distributed and parallel development. Damon's Agile Development Thoughts blog is at damonpoole.blogspot.com.

The main goal of software is to automate and simplify what would otherwise be accomplished using a manual process. This gives users of the software leverage to do more with less. Instead of balancing our checkbook by hand, we can use Quicken to do it faster and more accurately. Instead of maintaining the records of millions of people's financial transactions with paper and pencil, banks use mainframes.

Whatever industry you work in, what do you think about when you are creating new software? Don't you think about how it scales to meet your customers' needs as they grow with high availability? Even if you don't achieve that on the first try, aren't you still thinking about it and striving to achieve it? You know that to do it, you need to make the right technology and architectural choices. Over time, you may need to change some of those choices to keep pace with competitors. Even if your current needs are modest, software development architecture has evolved principles, patterns, and technologies to allow software to scale from a single user to thousands or even millions of users on multiple platforms at multiple locations with 99.999 percent uptime.

On the other hand, as an industry, we don't seem to be very good at scaling up the process of software development itself. Even when what needs to be built is primarily made up of well-understood tasks and requires little discovery, it seems that when the size of the effort goes beyond 5-10 people, things start to break down pretty rapidly. Adding more people usually doesn't scale in the same way that adding more hardware does.

This stems from the belief that software automates a well-defined problem but that the process of software development is a creative endeavor and cannot benefit from the same kind of thinking that goes into designing software. However, after spending many years involved in thinking about how to improve the process of software development, experience has convinced me that a great deal of the process of developing software can be treated as infrastructure. This infrastructure supports the creative parts of the software development process, but can be clearly delineated from it.

By thinking about the infrastructure of your process in the same way as you would think about any manual process that requires automation, you can leverage your skills as a developer and apply them in a new way. The process of developing software is, in effect, an algorithm implemented with various technologies. You can think of your team, the technologies you use, and your development methodology as software.

What technologies are you using? What is the architecture of your organization? Will it scale from its current size to double its size? Will it scale seamlessly to include new teams in new locations? What will happen if you acquire a company?

When creating software, you want to design it so that it is flexible and adapts to new circumstances. The same should be true for your development organization. In this article, I apply this way of thinking to the problem of scaling continuous integration to development efforts of more than 10 people.

Continuous Integration

In a typical project, the integration phase is towards the end and is the source of a great deal of problems and delays. One practice that has emerged to address this problem is continuous integration. The basic idea is that if getting build and test results on a regular basis is a good idea, getting build and test results on every change is even better.

With continuous integration, all work is integrated into a single codeline as frequently as possible. Every check-in automatically triggers a build and often a subsequent run of the test suite. This provides instant feedback about problems to developers and helps to keep the code base free of build and test failures. It also reduces the integration headaches just prior to release.

I'm a big fan of continuous integration. I've used it and recommended it for many years with great success. But it has a dark side as well. The larger and/or more complex the project, the higher the chance that it devolves into what I call "Continuous Noise." In this case, you get notified every 10 minutes or so (depending on how much building and testing is going on) that the build and/or test suite is still failing. It may be for a different reason every time, but it doesn't matter. It is tough to make progress when the mainline is unstable most of the time. This problem is not caused by CI, but it is exposed by CI.

The question is, what is a good way to structure this integration so that it will scale smoothly as you add more people to the equation? A good starting place is to look around for a pattern to follow. What are some similar situations? I have found that everything your organization needs to do in order to produce the best possible development organization can be entirely derived from the patterns and practices at the individual level. It makes it much easier to understand and much more likely that it will be successfully followed.

Self Integrity

When you as an individual are working on a change, you are often changing several files and a change in one file often requires a corresponding change in another file. While it may seem like a bit of a trivial case, you can think of this process as self-integration. The reason that you work on those files on your own instead of having several people work on it is because the tightly coupled nature of the changes requires a single person.

As an individual developer, there are two things that you do to shield yourself and others from instability. You make changes frequently, but you only check-in when you feel that your changes are integrated, tested, and won't disrupt other people.

Conversely, you only update your workspace when you are at a point that you feel you are ready to absorb other people's changes. Because other people only check-in when they feel the changes are ready and you only update when you feel you are ready, you are mostly shielded from the constant change that is going on all around you.

[Click image to view at full size]

Figure 1: A developer’s integration pattern.

This is the basis of multi-stage continuous integration: If individual isolation is a good idea, then isolation for features, teams, team integration, staging, QA and release is an even better idea.

Moving From Known Good to Known Good

The consumer portion of the developer integration pattern is also found in the way that customers interact with their software suppliers and the way that software producers interact with their third party software suppliers. Your customers don't want random builds that you create during development and you don't want random builds from third parties that you depend on. The reasons are simple and obvious. The level of quality of interim builds is unknown and the feature set is unknown. Your customers want something that has a high level of quality that has been verified to have that level of quality. Likewise, you want the same from your suppliers. Your suppliers include things like third party software libraries or third-party off-the-shelf software that your application depends on such as databases and web servers.

[Click image to view at full size]

Figure 2: Consumers at every level only take stable builds, aka "releases".

As with your own software, each third party that you rely on produces hundreds if not thousands of versions of their software, but they only release a small subset of them. If you took each of these as they were produced, it would be incredibly disruptive and you would have a hard time making progress on your own work. Instead, you move from a known good build to a known good build, their externally released versions. Your customers do the same.

This simple principle should be applied throughout your development process. Think of each individual developer as both a consumer and producer of product versions. Also think of them as a third party. Think of each team as well as each stage in your development process this way. That is, as a developer think of your teammates as customers of the work that you produce. Think of yourself as a customer of the work that they do. You want to move from known good version to known good version of everything you depend on.

It's All for One and One for All

The next level of coupling is at the team level.There are many reasons why a set of changes are tightly coupled, for instance there may be a large feature that can be worked on by more than one person. As a team works on a feature, each individual needs to integrate their changes with the changes made by the other people on their team. For the same reasons that an individual works in temporary isolation, it makes sense for teams to work in temporary isolation. When a team is in the process of integrating the work of its team members, it does not need to be disrupted by the changes from other teams and conversely, it would be better for the team not to disrupt other teams until they have integrated their own work. But just as is the case with the individual, there should be continuous integration at the team level, but then also between the team and the mainline.

So, how can we take advantage of the fact that some changes are at an individual level and others are at a team level while still practicing Continuous Integration? By implementing Multi-Stage Continuous Integration. Multi-Stage CI takes advantage of a basic unifying pattern of software development: software moves in stages from a state of immaturity to a state of maturity, and the work is broken down into logical units performed by interdependent teams that integrate the different parts together over time. What changes from shop to shop is the number of stages, the number and size of teams, and the structure of the team interdependencies

For Multi-Stage CI, each team gets its own branch. I know, you cringe at the thought of per-team branching and merging, but that's probably because you are thinking of branches that contain long-lived changes. We're not going to do that here.

There are two phases that the team goes through, and the idea is to go through each of them as rapidly as is practical. The first phase is the same as before. Each developer works on their own task. As they make changes, CI is done against that team's branch. If it succeeds, great. If it does not succeed, then that developer (possibly with help from her teammates) fixes the branch. When there is a problem, only that team is affected, not the whole development effort. This is similar to how stopping the line works in a modern lean manufacturing facility. If somebody on the line pulls the "stop the line" cord, it only affects a segment of the line, not the whole line.

On a frequent basis, the team will decide to go to the second phase: integration with the mainline. In this phase, the team does the same thing that an individual would do in the case of mainline development. The team's branch must have all changes from the mainline merged in (the equivalent of a workspace update), there must be a successful build and all tests must pass. Keep in mind that integrating with the mainline will be easier than usual because only pre-integrated features will be in it, not features-in process. Then, the team's changes are merged into the mainline which will trigger a build and test cycle on the mainline. If that passes, then the team goes back to the first phase where individual developers work on their own tasks. Otherwise, the team works on getting the mainline working again, just as though they were an individual working on mainline.

[Click image to view at full size]

Figure 3: Multi-Stage Continuous Integration.

This diagram shows a hierarchy of branches with changes flowing from top to bottom and in some cases back towards the top. Each box graphs the stability of a given branch in the hierarchy over time. At the top are individual users. They are making changes all day long. Sometimes their work areas build, sometimes they don't. Sometimes the tests pass, sometimes they don't. Their version of the software is going from stable to unstable on a very frequent basis, changing on the order of every few minutes. Hopefully, users only propagate their changes to the next level in the development hierarchy when the software builds for them and an appropriate amount of testing has been done. That happens on the order of once per hour or so, but ideally it happens no less than once per day.

Then, just as individuals check-in their changes when they are fully integrated, the team leader will integrate with the next level and when the integration build and test are done they will merge the team's changes to the next level. Thus, team members see each other's changes as needed, but only team member's changes. They see other team's changes only when the team is ready for them. This happens on the order of several times per week and perhaps even daily.

Changes propagate as rapidly as possible, stopping only when there is a problem. Ideally, changes make it to the main integration area just as frequently as when doing mainline development. The difference is that fewer problems make it all the way to the main integration area. Multi-Stage CI allows for a high degree of integration to occur in parallel while vastly reducing the scope of integration problems.

Distributed Integration

All of the reasons that make continuous integration a good idea are amplified by distributed development. Integration is a form of communication. Integrating distributed teams is just as important as integrating teams that are collocated. If you think of your teams as all being part of one giant collocated team, and organize in the same manner as described in the section on Multi-Stage Continuous Integration, it will be much easier to coordinate with your remote teams.

[Click image to view at full size]

Figure : Thinking of distributed teams in terms of function rather than location.

Getting Started

Getting to Multi-Stage CI takes time, but is well worth the investment. The first step is to implement Continuous Integration somewhere in your project. It really doesn't matter where. I recommend reading the book Continuous Integration by Paul Duvall, Steve Matyas, and Andrew Glover. The next step is to implement team or feature based CI. Once you have that working, consider automating the process. For instance, you can set things up such that once CI passes for a stage, it automatically merges the changes to the next level in the hierarchy. This keeps changes moving quickly and paves the way for easily adding additional development stages.

I've seen Multi-Stage Continuous Integration successfully implemented in many shops and every time the developers say something like: "I never realized how many problems were a result of doing mainline development until they disappeared."