Atlassian is the company behind several well-known developer tools: the defect tracker JIRA, the Confluence wiki, Bamboo continuous integration server, and others. We also run the BitBucket project hosting service, where lots of open-source projects live. This article focuses on the biggest migration we have done in our company history: We moved the 11-year-old JIRA codebase from Subversion to Git. I'll discuss why we migrated, the obstacles we encountered, and the lessons learned. I'll also explain how we managed to do it without interrupting our work on JIRA. I'll focus on Git, because we migrated JIRA to Git, but everything in this article applies equally to Mercurial. At Atlassian, we use both.
Why a DVCS?
I have used Subversion (SVN) successfully on many projects, as has Atlassian and many other sites. Since there is always a cost to migration, you may be inclined to ask, "If Subversion has met my version control needs for many years, why should I change?" To me, that is the wrong question. The real question is, "How can a distributed version control system (DVCS) make what I do today even better?" In the case of Git (although the same things could be said about Mercurial), it's faster and it enables advanced workflows via features such as branching, forks, and pull requests. In theory, these workflows are all possible with SVN; however, the difficulty of merging in SVN compared to Git makes them untenable.And that is the main benefit of Git and DVCS: lightweight branching and easy merging. These enable your default SVN workflow better than SVN does.
To understand exactly what I mean, let's look at how we develop and release software at Atlassian (and I expect at many other developer organizations). Most of us work in a world where we have at least one released version of our software in the wild, which we call a "stable" branch. We maintain and contribute bug fixes to a stable branch while developing new features on a "development" branch, which is referred to as the trunk/master/default, depending on which VCS you use.
Figure 1: The Subversion process.
When we commit bug fixes to the stable branch, we need to add them to the master, too. Because SVN merge is known to be a pain and works solely on revision history not actual content a lot of people avoid it or use it infrequently and not as part of their day-to-day workflow. How many projects have you worked on where stable and development branches have started to diverge, or diverged so significantly that the effort to bring them back together is a serious project cost? I have certainly worked on projects where this has happened. And when I speak to other developers, it's a frequent occurrence for them, too, when using SVN. (There are some strategies to deal with the problem. For example, on our JIRA product, we ignored merging and required developers to make each commit individually to each stable and development branch, relying on QA to make sure that it happened correctly.)
Git enables you to remove this pain entirely. It makes merging the entire stable branch into the development branch on each commit easy. In fact, it's now our default workflow. So even if you don't want to use Git-specific feature branches or forks or pull requests immediately, it still provides advantages from day one. Later, when we were ready, we were in a position to employ the advanced workflows that Git allows.
Before the switch to DVCS, our major products targeted 90-day release cycles. These 90-day releases went to two platforms: downloadable products for clients to install on their own servers; and a release to our hosted cloud platform (Atlassian OnDemand) for which clients pay a monthly fee. Using branches as a core part of development workflow has allowed us to shorten this to the point where we now release our major products to the cloud every two weeks.
JIRA is consists of 21,000 files with11 years' worth of history, comprising 47,228 commits. We average about 30 different committers over a two-week period. More than that, the VCS is a real work-horse for a project like JIRA. Builds, code reviews, scripts for releasing both product distributions and source all these things have a rich tapestry of dependencies on the source code management system. An important goal in the migration was to minimize interruption to developers. This is about more than just the ability to commit code; it is about the infrastructure surrounding software development.
In addition, we have 3.5 years of history in JIRA's code review system. And JIRA has a lot of CI data, as we run approximately 60 build plans over different configurations and branches. We have some other dependencies, too– JIRA has a somewhat complex release process that involves pulling together code from multiple sources. We also release our source code to customers, which involves a different set of build scripts.
There is a tradeoff here between how fast you can migrate and how stably you can do it our guiding principle was to optimize for stability over speed. If you set a deadline for your migration and it slips, what's the worst that happens? Developers have to commit code to SVN for another week or so not the end of the world. It's far worse if the migration interrupts developers' ability to work and meet their own deadlines.
In the end, the migration took us 14 days in total, with only two hours where developers were unable to commit code. We were nearing the end of the development cycle for our latest release, JIRA 5, and at no point were we unable to create a release candidate.
When preparing a migration, there are a couple of things to be aware of.
First, it will take time. The actual git-svn clone, which takes all of the commits in the SVN repository and replicates them in Git, took us three days.
Second, you should prepare for surprises and think of all the dependencies your infrastructure has on your VCS. Know that if your infrastructure is sufficiently complex (like ours), there will be things you never dreamed of and only discover when they break. So don't beat yourself up when you encounter a dragon. Just slay it, and continue on your quest.
A migration like this is not something you can do overnight, or even over a weekend. It needs to be managed for a sustained period of time.
Migration: The Technical Side
Stably migrating is daunting, but it is not brain surgery. Here is the process we employed to make it manageable.
1. Clone the SVN repository to Git
First, choose a location for your Git repository. When we migrated JIRA, we decided to move from an internally hosted SVN repository to a private Bitbucket repository in the cloud. This was a good fit for us: We have geographically disparate teams in Sydney, Gdansk, and San Francisco; plus it makes committing easy for people working from home. It's also part of our internal "eat-your-own-dogfood" practice to run off Bitbucket since that's our DVCS code-hosting product. We also mirror our repository to Atlassian Stash, our behind-the-firewall Git repository manager.
Once our Git home was set up, we cloned the SVN repository into Git using git-svn clone.
Figure 2: The Subversion repository is cloned to Git.
Essentially you'll want to use git-svn clone to create an intermediate repository on a local machine; then push from this intermediate Git repository to the real Git host you will be using. All developers will be reading and writing to this real Git host, not the intermediate repository.
This is the most technical part of the migration. (For more information on the ins and outs of this step, consult this detailed blog post on the process written by one of our development leads.)
2. Mirror from SVN to Git
After the migration was complete, we set up a script to mirror every commit to SVN into Git. It's a good idea to make the Git repository read-only to everyone except the user that the mirroring script runs under. If you have eager developers who start committing directly to the Git repository, the mirroring script might encounter merge conflicts and you will have to manually resolve them. Avoid this at all costs.