Every project is different and the same, in that people are the most important factor. And people need some process to make some progress. The catch-22 is that too much process will paralyze and prevent success.
Knowing where to draw that "how-much-process" line is, of course, the classic conflict, often pitting the people who write the code against the ones who run the business. Jim Duggan, vice president and research area director in the application development practice at the Gartner Group, points out that these two groups often have different mindsets when it comes to risk-taking. "Developers need to have the confidence to deal with the uncertainties and the aggravation of programming," he says. "The Vikings of code soar in, crash on the shore, laugh uproariously and fix it. This becomes a problem when you try to build collaboration and build groups."
But nowadays, even the horned-hat crowd agrees that revision control tools are part of that bare minimum of necessary process. Otherwise, you're spending time and brainpower on tasks a tool could be doing for you. If you do code and you don't have a revision control system in place, please do yourself a favorget one! Any of the tools mentioned in this article will be a heck of a lot better than nothing at all, and some of them can be had for merely the cost of some download and setup time. I assure you that your sweat will be amply repaid the first time you restore a module you've munged.
When choosing a revision control system, here are some important axes along which to evaluate:
- Configuration management
- Conflict resolution
- Integration, scripting and automation
- Support for distributed teams
- Vendor lock-in
Configuration management can be contrasted with straight file-based revision control. Suppose I'm working on my weather-radar-slurping tool, RainSniffer, and (as frequently happens) I wander down a blind alley and need to back up and start again. With simple revision control, I can examine the change log for any given file, figure out which revision number is the one I checked in last Thursday and extract that revision. But any significant software project has more than one fileso how do I determine which ones I've touched and restore them? Configuration management allows me to constructand reconstructthe entire software system as it existed last Thursday: all the Java files, build scripts, data files and even the libraries, if I really want to. That way, I can hand a working version off to my coworker. Really good configuration management is often tied in with build tools, so that I can tell the build system to "Build that thing that worked last Thursday," and it'll extract the appropriate Java and properties files, and make the system. Some products allow you to collect a bunch of changes (say, a bug fix that touches a dozen source files) into a change set, which can then be treated as a single entity for tracking and assembling particular builds.
Conflict resolution is important if your team size is greater than one, or your left hand sometimes doesn't know what your right is doing. Primitive revision control systems used "strict locking" to avoid conflicts: Only one developer at a time could hold the lock on a file. Others could check out read-only copies, but couldn't modify them. Unfortunately, that neat, prescriptive approach doesn't mesh all that well with the messy realities of coding. Sure, I'm mostly working on RainSniffer.java, but to refactor the class, I need to touch three other files that use itand my coworker has the lock on those. How do I proceed? Primarily, today's tools have changed the emphasis from preventing conflict to cleaning up afterward, by relying on merging the various change sets. If you and I both make changes to the same set of lines in a source file, the system kicks out a warning, flagging the offending lines: "Human intervention required." Of course, even when the change sets are disjointed (I change lines 30-40 of Fnord .java; you change lines 150-200), there's no ironclad guarantee of avoiding conflict at the semantic levelI might have changed the implementation details of a data structure that your method manipulates (remember, this is a single source file we're talking about, so close coupling is inevitable). Tools tightly integrated with the language can help. For instance, IBM's VisualAge products perform revision control on a method-by-method basis. Every time you save a method, it's stored in the repository as a new revision. That's still not perfect (conceivably, you and I could still make a hash of a method by each changing parts of it), but the finer-grained control makes it less likely that conflicts will occur.
Branching is the ability to support parallel lines of development on the same code basethough, at some point, the concept of "same" recedes into metaphysics. In the canonical example, you release, say version 2.5 of Fnordilizer Deluxe, start work immediately on Version 3.0 and then get a bug report. You wouldn't want to just fix the bug in the 3.0 code and ship that to the customerafter all, 3.0 isn't even in alpha yet! Conversely, just saving a snapshot of the 2.5 release code and fixing that won't do the trickwhat if you have to roll back the change? For that matter, you do want to incorporate the fix into 3.0 eventually. Branching is the answer: Work continues on the 2.5 codebase, with revisions being added. At the same time, work on 3.0 continues. Eventually, when you decide to incorporate the bug fixes into 3.0, you merge the branches and continue the mission. (At least one product, Rational's ClearCase, offers a 32-way merge!)
Integration, scripting and automation: How well does the configuration management system work with your existing tools? For example, JBuilder Enterprise provides a "team" menu that can drive some revision control functions, while VisualAge makes moot the issue of a revision control tool altogether. Does the system provide command-line tools or other means of driving it from your existing workflow? For example, you can write Python scripts to drive Perforce.
Conversely, can the configuration management cart drive the software development horse when needed? Integration with your existing process and its supporting software helps to avoid error-prone drudgery. It can be as simple as setting up the system to e-mail the change log entry to the rest of the team when you check in a module, or as sophisticated as ensuring that checking in a bug fix alerts the test team, updates the bug tracker and recalculates the project metrics.
Internet support for distributed teamsdon't they all have this? No, and don't assume that your team's among the innocent. For instance, my "team" at the University of Wisconsin consisted of just two people sitting back-to-back: the Finn and me. A CM tool working over a LAN was just ducky for usheck, in a pinch we could simply hand The Official Source Zip Disk back and forth. But then we started working with developers in another buildingwhoops, a different IP subnet, and the Windows file-sharing packets don't cross the router. Next, a professor wanted to work from home through an Internet service provider (hence a different IP address every time)and so it grew. Some tools use a client-server setup; others build Web applications so that you can run their systems using just a browser. The latter approach also neatly circumvents firewalls, which are almost universally set up to pass the HTTP protocol via TCP port 80. (Of course, as more tools piggyback on HTTP, the security problem merely migrates from the firewall to the internals of the Web server.) Speaking of security, it helps if traffic between client and repository is encryptedthat way, nobody with a packet sniffer can snarf up your password and use it to "extend" your products in novel ways.
Vendor lock-in is one of those Holy Grail things: Never, never, never bet the company on a single vendor. If your company develops software, your source code is its lifeblood, second in importance only to the people who write it. But you must temper the absolute with a dose of reality: Even if you grew all your tools in-house, you'd still be depending on somebody. If a company goes out of business or drops your tools, they won't roll over and die instantly, but make no mistake, bit rot will eventually set in. The operating system will evolve out from under the tool, or you'll develop a critical need for new features. (Alas, BRIEFI'd still be using it if it wasn't an orphan.) Open-source and free-software advocates are quick to point out that if you possess a tool's source, you're in complete controlin a pinch, you can become the primary maintainer. Practically speaking, it's probably safe to assume that a company like Microsoft won't suddenly bail from the developer-tools market. Still, it's always prudent to beware of lock-in. At the very least, you should be able to migrate your source repositories from your chosen version-control system to one of the others on your short list.
OK, now we know what we're looking for, what tools fill the bill? Plenty! Let's take a look at one on each end of the spectrum: a simple file-based revision control tool and a full-fledged enterprise configuration management system.
If all you need is file versioning, it's hard to beat tools like Revision Control System (RCS) and Concurrent Versioning System (CVS). For one thing, they're open source and are included with many Linux and Unix distributions. And if you stay within the "comfort zone" of commonly used features, they require neither much administrative support nor extensive training.
RCS was once available only to Unix developers, and, along with the Source Code Control System (SCCS), popularized the version control concept. Its grandchild is CVSthe usual tool in the open-source community (although Linus Torvalds uses BitKeeper for the Linux kernel itself). CVS's configuration management tools are stone-simple: You can assign a tag across a project's files at any point and retrieve that set anytime. As for conflict resolution, CVS is predicated on the assumption that a mass of programmers will all be hacking on a codebase at once, without much intercommunication other than through the focal point of the code repository itself.
In CVS, you check out a directory tree and go to work. Meanwhile, your peers are busily changing code and checking it back in, so CVS allows you to "update," or merge in changes to the tree that have occurred since you checked out your copy. Updates must be run manuallythere's no built-in provision for notification or automatic updates. It's reasonably smart about merging in conflicts, so that if you tweak line 200 of RainSniffer.java and I change line 110, both changes appear in your copy when you run update; if we both modify line 150, you get an alert message, and the conflict is highlighted in the source.
CVS has a set of commands to create branches and merge them again. Its security model is relatively simplistic, but does allow for partitioning access into "maintainers" (trusted insiders who can check in code) and, well, everybody else, who must submit changes to a maintainer to get them included. CVS can be set up for anonymous checkouts over the Internet; the most common way to implement maintainer access is to give each maintainer a login account on the (usually Linux) repository machine. CVS can use Secure Shell (ssh)-encrypted authentication. It also enjoys widespread support among development environments, and there are plenty of resources for learning: books, Web pages, newsgroups and the like. Because there's a whole boatload of third-party GUI front-ends for the command-line CVS commands, at least one will be available on just about any given platform. The existence of these tools points to CVS's Achilles' heel: To the GUI-bred among us, it's a typical Unix-ish command-line swamp. The single command "cvs" hides 26 subcommands, each with a panoply of options and switches. Comparing revisions produces standard "diff" output, usable by your choice of visual-differencing tools; on the other hand, you have to download, install and hook up the tools yourself. CVS is also ill-suited to projects prone to frequent refactoring; renaming or moving files and directories is a grim, error-prone headache.
CM tool use is a practice that is most efficient in the context of a larger process. It's no coincidence that the market-leading vendors (who are, according to Gartner's Jim Duggan, Rational, Computer Associates, Serena, Merant and MKS) all offer product lines that integrate revision control, configuration management, builds and workflow features. Mostly, these high-end products provide the same basic functionality as a versioning tool, but the difference is in the details: The high-end tools provide more options and vastly improved usability.
An excellent example is 12th annual Software Development Jolt finalist MKS Source Integrity Enterprise (SIE henceforth, if you don't mind). SIE is designed from the get-go for distributed development, with a pretty sophisticated client-server architecture backed by an industrial-strength relational database. The basic working pattern is to create a "sandbox," a private copy of a project wherein each developer can experiment at will.
SIE supports configuration management primarily through "checkpointing," which assigns a label across a project. At any time in the future, you can use the label to retrieve a sandbox containing those exact files. If desired, the sandbox can be designated as a "build" sandbox, which is a static copy especially useful for testing, building releases and the like, since no changes can be checked back in. SIE can handle conflict resolution both pro- and re-actively: You can lock a file when you're checking it out to prevent anyone from mauling it, but if you forget to do that, you can lock it after making your changes. SIE provides automatic change merging via line-based differences, so if you like, you can even work à la CVSchange them all and let the merge utility sort them out. SIE includes a visual-difference tool, too, so you don't have to go blind parsing diff output on your own. (If you prefer, SIE can also talk to your favorite visual-diff tool instead.)
Branching is automatic, and there's a lovely set of visual tools that allow you to examine the project's history graphically, instead of puzzling it out from the change logs. At the other end, the merge utilities will allow you to incorporate the branch back into the main development "trunk" if you need to.
MKS tools are all about integration; SIE's documentation lists a dozen development tools (like Borland's JBuilder and Sybase's Powerbuilder) with which you can do at least the basics like checking files in and out, reverting changes and so on. There's even right-click support in Windows Explorer. And, of course, SIE plugs right into the rest of the MKS product line. One particular feature that caught my fancy becomes available when you have SIE and Integrity Manager, MKS's workflow/change-management system: You can define a "change package," or a set of files touched by a single issue (bug report, feature request, whatever), then manage subsequent builds by manipulating revisions at the change package level. You could, for instance, include or exclude features from a particular build based on what change packages go into it. Finally, there are command-line versions of most of the tools, so that if you want to get down and dirty with the GUIs-are-for-wimps crowd, you can script SIE to your heart's content.
SIE provides a plethora of options for authenticating members of distributed teams. By default, it uses the server system's own authentication, but you can set it up to make use of a password file, LDAP or SSL; some methods ensure that all communications between clients and servers are encrypted. SIE also provides fine-grained control over who can do what; access control lists allow privileges to be assigned to individual users or to groups, granting them blanket or specific-operation access to projects and files. Developers can use the command-line tools or the supplied GUI client for their work, while a Web interface is available for project-level tasks.
Vendor lock-in is a consideration with any of the high-end tools. We're not just talking about file repository formats nowthere's a great deal of auxiliary data, too, such as those authentication databases. Quite a few of the tools will import some of the others' repositories, at least, so you won't be totally stuck if you want to switch. A quick search of MKS's Web site, for example, shows conversion utilities from Visual SourceSafe and PVCS.
Those Darn Weasel Words
Yes, I'm going to tell you "it depends"but you knew that, didn't you? The very presence of so many products in the market should clue you in: One size definitely won't fit all. A one-person shop can often get by with just RCS; if you've got dozens to hundreds of geographically distributed developers and a complex workflow, you'd better pony up some serious money. Almost all of the vendors will give you a free evaluation; Perforce allows unlimited free use for one or two developers. Please don't put it offwith revision control in place, you can get back to happily hacking two-handed at your code, roaring lustily.
A Revision Control/Configuration Management Tool Sampler
|Product Name||Vendor URL||Pros and Cons||Price|
|BitKeeper||www.bitkeeper.com||$400 per seat and up|
|ChangeMan DS||www.serena.com/product/cm_ds_ov.html||Per seat/server mode|
|ClearCase||www.rational.com/products/clearcase/index.jsp||$3,000/node-locked license in the U.S.|
|CMSynergy||www.telelogic.com/ products/ |
|PVCS Version Manager||www.merant.com/pvcs/products/ |
|Starts at $649 per seat|
|Perforce (a.k.a. P4)||www.perforce.com||$750/seat; quantity discounts|
|Source Integrity||www.mks.com/products/sie/||Server: $8,000
Client licenses: $900
|StarTeam||www.starbase.com/products/starteam||Per seat: client-$1,999 server-$9,999|
|Visual SourceSafe||msdn.microsoft.com/ssafe||$545 Stand-alone|