Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Subversion Version-Control Program


August, 2004: The Subversion Version-Control Program

An update-to-date version-control package

Jeff is the UNIX systems manager for CitiStreet and cofounder of the Apache Directory Project. Jeff can be contacted at jmacholsapache.org.


When it comes to version control in the open-source arena, there's no question that CVS rules. However, this reign may be ending. The Subversion version-control program provides all the benefits and features of CVS, along with many improvements. Since its release in early 2004, Subversion (http://subversion.tigris.org/) has already been adopted by projects such as the Apache Software Foundation Directory Project (http://incubator.apache.org/directory/).

Because Subversion contains all the features of CVS, its creators designed the interface to match CVS as much as possible. For instance, the command-line interface is named "svn" and the basic operations have the same names. In addition, "cvs up" is "svn up," thereby easing migration to Subversion. Additionally, the command-line interface binaries are available for most operating systems. Subversion also uses the copy-modify-merge paradigm, so existing development processes built around CVS do not have to be changed.

Although the Subversion toolset is small, there are a few key tools. Windows users who want a GUI can get TortoiseSVN (http://tortoisesvn.tigris.org/), a plugin to Explorer that embeds SVN commands on the right-click pull down menu. Likewise, for the Eclipse IDE, there is a plugin called Subclipse (http://subclipse.tigris.org/).

Architecture

I'm the first to say that CVS is a great application—its longevity alone is a testament to this. However, it is starting to show its age. Consequently, the Subversion creators realized that to fix some of CVS's deficiencies, a rewrite was required. The first key point in Subversion's architecture is the use of the Apache Portable Runtime (APR) libraries, a set of libraries that abstract away OS-specific calls in the C code, providing a Java-like "write once, run anywhere" model. This eliminates the need for lots of excruciating compiler directives in the code, streamlining development and source-code maintenance.

Subversion's design is based on three distinct components—the client, network, and filesystem. Each component has an interface that lets you focus on one area of the application. You can write a new client-side feature without having to deal with the guts of the filesystem implementation. The interface provides a means for applications to be easily built on top of Subversion, such as GUIs and plug-ins. This modularity allows for different components to be snapped in without disrupting the rest of the application. And while there are other design and architectural improvements in Subversion, the most beneficial is the overall open-source community orientation. Creating an environment that lets more contributors easily get involved expedites the development of the project into a robust and stable production application.

Atomic Commits

One of Subversion's most powerful enhancements is the concept of atomic commits. Logically, CVS treats a commit as a set of individual check-ins, one for each file changed. Each file has its own incremental revision number and log message. Subversion, on the other, treats each commit as one change to the repository and has a global reversion number. This is the part of Subversion that takes some getting used to. The reversion number on a file or entry in the repository is on a per-commit basis, not per file. The global reversion number starts with 0 and is incremented as a whole number for each commit on the repository. Each entry changed for that commit will have its new revision number set to the global revision number. In Subversion, revision numbers in a file are not necessarily sequential (1.1, 1.2, 1.3). Your revision numbers will look more like: r1, r10, r25. Since a commit is treated as a new snapshot of the repository, it captures adds, moves, and copies objects, as well as changes to an object. This is a powerful mechanism that lets you retrieve the state of the repository at any given time with ease. For instance, consider this sequence of events:

  1. A new repository is created. Two files, foo.c and bar.c, are created and committed.
  2. You make a change to foo.c and commit.
  3. Someone else makes a change to both files and commits.

Now look at the logs for the two files in Figure 1. Since foo.c was changed each time a commit was done on the repository, it has a log entry for each revision. The file bar.c was not changed during the commit of revision 2, so it will not contain a log entry for r2. Revision 2 of the repository contains a version of bar.c that's identical to r1. Since the identical copies are links inside the repository, it does not use extra space. In terms of storage, in fact, Subversion is actually cheaper than CVS because the revision numbers and log messages are stored once globally, instead of one per file. This also makes modifications to log messages easier because it only needs to be changed in one place.

Directories and Metadata

A longtime complaint with CVS is its lack of functionality around versioning directories. This deficiency makes it difficult to move or rename files and directories. The only way to move a file in CVS is to remove it and add it to the new location. This causes problems such as accessing history, tags, and previous releases. Subversion addresses these problems; in fact, according to some of the documentation, this was one of the reasons for writing it.

There are commands—copy, delete, mkdir, and move; or, for the UNIX-oriented, cp, rm, mkdir, and mv—that allow file and directory manipulation from the working directory that, on a commit, is updated in the repository. These actions preserve the history of the object, including the hierarchy from previous revisions. For example, if you move foo.c to file.c and commit to revision 10, a checkout of revision 9 contains foo.c in its original location. This is helpful when restoring a previous version and trying to compile after the directory structure has changed.

Subversion has a much more robust metadata implementation than CVS called a "property"—basically a hash table that exists for each element in the repository, which contains a name and value. The name and value can be any text. Properties are also versioned, which gives another level of traceability and history for your repository. A great use for properties is your software promotion process. For instance, you can create a property called "state," which has the value development, test, QA, or production. Automated scripts can easily be generated to build code based on a particular state. You can also go back to a specific revision and see what the property was set to. Along with the custom properties, Subversion has a set of built-in properties for common tasks. These include setting up your ignore list, setting the system execute bit on the file, MIME type, keyword substitutions, and EOL style.

Branches and Tags

Compared to CVS and most other version-control systems, branching and tags are implemented differently in Subversion. In Subversion, a branch is simply a copy of the files in a different subdirectory. Just like CVS, you use the merge to sync your changes with the HEAD. While you can put the branch subdirectory anywhere in your working copy, a standard has been developed that has been adapted by most Subversion users. Before addressing these standards, you have to decide if the repository will contain multiple projects or if there will be a 1:1 ratio. For now, assume you have one repository that contains three projects: GUI, Backend, and Core. Each project in the top-level directory should contain the subdirectories trunk, branches, and tags. The trunk should be your main development path. The branches and tags directories should clearly contain the project's branched and tagged paths. Your layout will look something like Figure 2.

There are no rules as to where a project root needs to be; in fact, they do not all need to be at a level in the tree. These should be located where it makes sense to the layout of the project. Also be aware that the revision number is global to the repository, not to an individual project root.

Protocol and Backend

There are two options for the network protocol that Subversion can use. The first is a simple, lightweight SSH tunnel, which is provided by a server-side daemon called "svnserve." The other approach is to use the Apache HTTP server. Subversion has an Apache module called "mod_dav_svn," which uses the WebDAV and DeltaV protocols over HTTP for the client/server communication. There are several advantages to using this protocol, the first being stability and acceptance. You don't have to worry about convincing your network admin to open up a new port or suffer through a long debug process of a custom protocol. Subversion is also able to take advantage of existing features built into this protocol such as authentication though the HTTP server and version browsing with WebDAV.

Instead of writing a proprietary backend, Subversion used Berkeley DB from Sleepy Cat Software (http://www.sleepycat.com/) for its data store. Once again, the underlying principle of using existing technology is a key factor. There was not a real benefit to going through the trouble of writing a custom backend. Berkeley DB provides features such as hot backups, replication, journals for recovery, and efficient memory usage.

Migrating From CVS

Once you are ready to move from CVS, there are several decisions that need to be made. The first decision is whether to keep revision history and log messages. If it seems like an opportune time to baseline your code, maybe you won't need the history. If this is the case, a simple copy into your Subversion working directory and commit loads the repository. In the more likely case you decide to keep your history, the next choice is whether you bring branches and tags from CVS into the Subversion repository. Unfortunately, due to constraints in the conversion mechanism, this is all or nothing. Either you get all the tags and branches or just the trunk.

Subversion has a tool to import an existing CVS repository, including history, branches, and tags. This process can get a little dicey depending on your CVS repository. Basically, the more straightforward your CVS implementation, the higher chance of success in the conversion process. If you have lots of branches and have moved or deleted entries from the CVS repository, you may be in for a long night. The conversion tool is a Python script called "cvs2svn.py." This script is packaged with Subversion but maintained separately. This utility is lagging behind the Subversion code in terms of production worthiness (which could be one of the reasons it is separate). The good news is that the tool is being improved at a quick pace. I recommend getting the latest version before trying to run the conversion. You can check out the script and documentation from collabnet:

svn co http://svn.collab.net/repos/cvs2svn

Once you have the script, take a look at the README and run cvs2svn.py--help. It shouldn't take long to skim through this and be ready to give the conversion a try. The script has the ability to perform a dry run, making sure it can properly parse the CVS repository without actually doing the conversion. It is a good idea to try this and see if you have any issues before going through the actual conversion. The success and speed of the conversion will depend on many factors. If you hit a snag, here are some suggestions:

  • If possible, run the conversion on the box your CVS repository is on. If you cannot do this, see if you can get a tarball of the repository (maybe from a backup) and create it on your local machine. If you have a big repository, it is worth the effort.
  • Leave branches and tags behind. This is where I have run into most of my problems. If you are getting stuck on the branches, try running the conversion with the --trunk-only option. If you need the source from the branches, manually create them in Subversion. You lose the branch history, but not the HEAD history.
  • Make sure you have plenty of disk space. Conversion creates lots of temporary files.
  • Try the conversion first on a test Subversion repository. If you're adding to an existing Subversion repository, you do not want to get halfway through and have an error put your existing code in a bad state.
  • Create a dump file first. The cvs2svn script can be run to create a Subversion dump file using the --dump-only option. This dump file can be imported into a Subversion repository. This is especially useful if you run into problems on the Subversion side. Once you have the dump of the CVS repository, you can use this file instead of reading from CVS each time.

Conclusion

With the Subversion 1.0 production release and increase in supporting tools, it won't be long before CVS steps aside and lets Subversion become the de facto standard open-source version-control software. In fact, the CVS web site has a link to Subversion's homepage. The question for you is when—not if—do you migrate from CVS to Subversion.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.