Programmer's Toolchest

By Michael Simpson, October 01, 2000

Michael describes how branching and merging works in CVS, a widely used GNU source-code control system.

Oct00: Programmer's Toolchest

Michael is a senior consultant for Insight Technology Group in the Washington D.C. area. He can be contacted at msimpson@ itglink.com.

The CVS Branching Process In Detail

CVS is a widely used GNU source-code control system. It is popular because it does a reasonable job of source control, supports multiple platforms (Linux, Win32, BeOS, Macintosh, OS/2, VMS, and UNIX), and is freely available (http://www.sourcegear.com/CVS/ Dev/code). There are better tools than CVS, but they are generally expensive and require a fair amount of effort to set up and configure.

CVS is straightforward to use for smaller projects. When projects grow to a point where you need to separate code being released with bug fixes from code being developed for future releases, however, you'll need to implement code branching. CVS does branching like other tools, but I have found that the process of doing branching and merging in CVS is not well documented or understood. The primary reason for this is that CVS and its client tools (such as WinCVS; http://www.wincvs .org/) do not provide a way to see branches and how they relate. Tags can be seen on an individual file basis but not as a group. This means that you must keep the big picture in your head.

In this article, I'll illustrate a graphical way you can document and track branches and merges so they don't have to be kept in your head. I'll also describe how branching and merging works in CVS. Finally, I'll propose a convention for doing the branching and merging process. For background information on CVS, refer to the CVS online manual at http://www.gnu.org/manual/cvs/ index.html.

Graphically Documenting Branches and Merges

Figure 1(a), a typical branch, is a revision change flow for a set of files in the repository that is independent from other flows. By default, there is one main branch called "HEAD" in CVS.

As an example of branches, assume that there are files named A and B in the repository. File A's revision number is 1.1; likewise for File B. Both are in the HEAD branch. A change is made to File A and committed (checked in). It is now at revision 1.2 in the HEAD branch. At this point a branch called "BRANCH_NEW" is made. Now File A exists as 1.2 on both branches. File B exists as file 1.1 on both branches. A user will have a working set of code checked out in one flow and possibly a second working set in the other.

If a user commits a change to File A in his working area in branch HEAD, then File A has revision number 1.3 in the HEAD branch but the BRANCH_NEW is unaffected, where File A is still in revision 1.2. Now the user commits a change to File A in a working area that is in branch BRANCH_ NEW. Consequently, File A is revision 1.2.1 in BRANCH_NEW and is still in revision 1.3 in the HEAD branch. The file may look different depending on the branch in which the working area is set. At some point, changes on the BRANCH_ NEW will probably need to be merged into the HEAD branch, which is discussed later.

There is one initial branch called HEAD. All other branches are branched from an existing branch. New branches are almost always branched from HEAD, but it is possible to create one from any existing branch.

A branch is represented graphically by a thick arrow flowing from left to right. It represents the state (revision numbers) of all files on that branch. Implicitly, the flow of left-to-right on a CVS branch diagram represents the flow of time forward (although not necessarily to scale). Given any two points, the one on the left is earlier in time than the one on the right. The point on the right will represent the files with revisions that are the same or newer (if newer ones exist).

By convention, branches are labeled beginning with the word BRANCH_. The one exception is, of course, HEAD. The label is put inside the branch arrow.

Figure 1(b) is a tag -- a point on the branch-flow timeline--and provides a snapshot of all the files and their revisions at a particular time. Graphically, a tag is shown as a filled-in diamond that is located on a branch. It is shown directly on the branch arrow or slightly above or below it as is convenient for viewing. Its label is shown near the diamond with a line drawn to it. By convention, the label always begins with TAG_. When using CVS tools to look at branches and tags in files, it can be difficult to distinguish which is which. This naming convention solves that issue.

Tags are useful handles to indicate important milestones in the development cycle, such as releases. They provide a handle to retrieve the state of the repository at a certain time long after subsequent changes have been committed.

Figure 1(c) is a branch end -- a symbol that signals that no further development is to be done on a branch. The CVS repository has no concept of a branch end, so this is not enforced. However, it is useful for the sake of clarity in the diagram to let developers know that no further changes should be committed to the branch. Graphically, the branch end is represented as a circle with an "X" through it and is placed on the branch arrow at its right end.

A branch creation and merge flow are represented graphically as a vertical unidirectional arrow going from one branch to another; see Figures 1(d) and 1(e).

A branch (except for HEAD) is created from an existing branch at a specified time. The branch creation arrow is drawn from a location on the originating branch to the beginning of a newly created branch. It is labeled "branch."

A merge flow indicates that changes from one branch are merged into a recipient branch. This changes the recipient branch but does not change the originating branch. The arrow is labeled "merge."

Both the branch creation and merge flow arrows must be vertical because these operations are essentially instantaneous and the CVS branch tracking diagram has an implicit temporal flow from left to right.

Side Development Branches

Side development branches are a common way of managing major development changes to be made by a subset of developers in the codebase. Figure 2 is an example of a side development branch. The justification for having such a branch is that major changes are going to be made to the code, but until these changes are completed, it will hold up or disturb other developers' efforts on the HEAD. It may also reflect the fact that developers need to begin additional functionality development efforts prior to a release that is not meant to include their efforts yet (perhaps they are not expected to be ready in time).

A subset of developers (sometimes only one) work on the branch. At some point, the development effort will mature enough to be merged back into the HEAD. The problem here is that the branch may not reflect changes that have been made in the HEAD since the branch was made. The appropriate course of action is to merge changes in the HEAD into the branch, resolve conflicts, then merge into the HEAD, and commit.

The expectation here is that the final merge from the branch into the HEAD can be made almost immediately after the preliminary merge from the HEAD to the branch (between TAG_NEW_MERGE_ FROM_HEAD and TAG_NEW_MERGE_ TO_HEAD). If there is a lag time, more changes may be made on the HEAD that can cause conflicts on the branch. In this case, a second merge must be made from the HEAD to the BRANCH_NEW to resolve them.

The solution to the lag-time problem is to merge intermittently from the HEAD to BRANCH_NEW. The idea is to have BRANCH_NEW keep up with changes going on in HEAD as they occur. This way, conflicts are resolved early rather than having a big-bang merge at the end of the branch development that often results in the need for significant recoding on the branch to match significant design changes from the HEAD.

By convention, the originating branch is tagged at the time a branch is made with the prefix TAG_CREATE_. This is not necessary but is convenient to track what the branch looks like in its initial state. Similarly, by convention, tags are made on both branches when a merge occurs. In the figure you may notice a peculiarity. The merge flow arrow comes directly out of the middle of the tag on the originating branch and it ends on the left side of the tag on the target branch. A snapshot of the files on the originating branch at the time of the tag is merged into the target branch. This merge is done into a developer's working area. The developer then resolves any conflicts, commits the changes to the target branch, and creates a tag. The merge flow arrow ending just to the left of the tag is meant to indicate this gap of time between the merge operation and the tag on the resolved committed files.

A speculative development branch is the same as a side development branch but has a slightly different intent. It is for exploratory development that may or may not be merged back into the HEAD. If the explorative development works out, it is merged. If not, the branch is simply abandoned.

Sometimes it is desirable to try more than one approach to adding new functionality. In this case, a speculative development branch is made for each competing approach. At some point, the best one is chosen to merge into the HEAD.

Fork Development Branches

A forked branch is a more permanent form of branch. It is a branch that is never intended to merge back with its originating branch. Figure 3 shows a branch made after a software release tagged TAG_BRANCH_RELEASE_1_0. Sometime after the release, new development is committed to the HEAD branch. However, some bugs are found in the release that must be fixed. Releasing the current code on the HEAD branch may not be appropriate because the new code may not be of release quality. Instead, you want to fix the bugs in the release version and rerelease it without the new functionality.

The appropriate measure to take at this point is to make a new branch from the point of the release. Then the bugs are fixed on the branch and rereleased to customers. Meanwhile, new development continues unaffected on the HEAD.

Subsequent fixes are made on the release branch and released as needed. This branch is never closed but should become obsolete with the next release off the HEAD branch.

It is not intended for the release branch to ever be merged with the HEAD branch. The benefit is that this is an independent branch. The downside is that bugs found in the release usually need to be fixed also on the HEAD branch. This requires some discipline to assure that bugs are correctly fixed in both places. Otherwise, bug fixes on the release branch that are delivered to customers as minor releases may reappear in the subsequent major release off of the HEAD.

Generally, new functionality should never be developed on forked branches. Only problems should be fixed. Although it is often tempting to add some new things for the next minor release, you run the risk of seriously diverging into two codebases. If you cannot get those new features into the HEAD, you can end up supporting this fork branch in perpetuity to satisfy users who have come to depend on those features.

One exception to the no-new-functionality rule occurs when an evolutionary rapid prototyping model is employed. In this model, new functionality is quickly prototyped on a fork development branch. When done, it is reimplemented on the HEAD instead of merged. The idea is that new code is prototyped in a low-quality fashion for the sake of saving time. Once the proof-of-concept is done, the functionality is reimplemented in a high-quality fashion. On the next iteration, you branch off of the existing codebase to leverage it, then add in new prototyping code.

Process of Working on a Branch

When working on a side branch, you want to keep up with changes in the HEAD to prevent the side branch from becoming too far out-of-sync by the time you merge it back to the HEAD. It is best to merge changes from the HEAD into your side branch intermittently. This process in CVS is often poorly understood. It works like this:

1. Create your branch and begin working on it.

2. Merge changes in the HEAD the first time.

3. Correct conflicts and commit.

4. Continue working.

5. Merge changes in the HEAD that occurred between your last merge and now.

6. Go to Step 4 if more work is to be done.

7. When done with the branch, merge changes from the branch to the HEAD.

8. Correct conflicts on the HEAD and commit.

The typically misunderstood part is Step 5. The mistake that is often made is that instead of merging just the changes between the last merge and now, all changes on the HEAD since the creation of the branch are merged into the branch. This means that conflicts that were resolved after previous merges can reappear and need to be dealt with again.

To do the merge correctly, you need to tag the time of each merge. The next merge will reference that tag to indicate when to begin looking for changes.

The convention that I am proposing is that a tag is made on the HEAD at the last merge point (TAG_HEAD_ MERGE_ TO _NEW), and a tag is also made on the branch after the merge and conflict resolution (TAG_ NEW_MERGE_ FROM_ HEAD). After a subsequent merge is done from that tag on the head, the tag is moved (slid forward) to the new merge point. This can be done repeatedly as in Figure 4. For more information, see the accompanying text box entitled "The CVS Branching Process In Detail."

Other Issues

Branch Leadership. Several developers may work on a side branch. It is best to designate one person to be in charge of it. That person will coordinate and track tagging and merging for that branch. He will also declare when no further development should occur.

Partial Branches. The diagram, as I have defined it, assumes that when a branch is made, it is made over the entire CVS module. Branches can be made over subdirectories of a CVS module. I call this a "partial branch." Working with a partial branch can be confusing because some of the checked-out files are on the branch while the rest are on the HEAD. When this is done, I suggest prepending the branch label with BRANCH_PART and additionally labeling the branch in the diagram with the subdirectory name.

DDJ

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Programmer's Toolchest