Channels ▼
RSS

Tools

Putting Absolutely Everything in Version Control


Continuous delivery is the natural extension of continuous integration (CI). While the latter aims at running builds after each check-in to provide developers with immediate feedback, continuous delivery has a more sweeping goal. It seeks to build, test, and deploy the final executable with each check-in. (The deployment here is on test systems, not production.) The idea is that at all times a project has an executable deliverable that's known to be safe for deployment. It might not be feature-complete, but it is capable of running.

Continuous delivery is slowly overtaking CI at sites that embrace this form of agile development because it encourages useful best practices in many areas and it removes the problem of discovering unexpected defects during the deployment process. It also makes the team very familiar with deployment and removes the associated the moments of bated breath that deployment entails at sites that rely on traditional operations.

One best practice that is fundamental to continuous delivery is putting everything in the version control system. By "everything," I do mean everything. Here is an excerpt from the seminal text on continuous delivery: "Developers should use [version control] for source code, of course, but also for tests, database scripts, build and deployment scripts, documentation, libraries, and configuration files for your application, your compiler and collection of tools, and so on — so that a new member of your team can start working from scratch.

This is a radical position — how many of us put our compilers into a VCS? However, it solves a significant problem that arises rarely but can be terribly difficult: recreating old versions of the software. Most anyone who has done maintenance programming has had the experience of not being able to recreate a defect because a change in one of the tools makes the original binary irreproducible. This discipline also provides another benefit: The team can be certain that everyone is using the same set of documents and tools in development. There is no fear that team members overseas are using different requirements or a newer version of the compiler, etc. Everyone on the team is drawing from the same well.

However, fulfilling this mandate is no trivial task. At the recent Citcon conference in Boston, this topic came up for discussion in a session of CI aficionados. The first problem is that many development tools are not a simple binary with a few dynamic libraries, but rather, they rely on OS libraries and must be installed (especially on Windows) to run correctly. This can be remediated in part by use of virtual machines. Set up the OS and the tools as needed for the build automation in a VM, and then check-in the entire VM. This works well, but it requires that you also build the product in the VM, else you have two separate versions of the environment and they will inevitably get out of sync. (Linux and UNIX suffer less from this problem due to their lack of a registry. God bless the tool makers whose products place all the binaries and config files in a single directory!)

A more obscure problem is that not all VCSs handle binaries well. Git, for example, was designed as a pure SCM (rather than a VCS) and has known difficulty handling large projects or those with many binaries. (If you check-in tools and VMs, then your project will automatically be large in SCM terms.) In this realm, commercial products tend to excel. Perforce, especially, is known for having dedicated a lot of work to fast handling of binary files, especially on large projects.

Another challenge is the presence of passwords in scripts. This is partly offset because deployment in the continuous delivery model is to non-production systems, so that leaving passwords to non-production (that is, test systems) probably represents little risk. For other organizations, encryption can provide a solution.

Finally, I should note that even the book I quoted above recommends against storing the binaries generated by a build in the VCS. This makes sense, as binaries tend to be large and numerous, and the whole point of putting everything in the VCS is precisely to be able to recreate those same binaries at a future point.

Personally, I don't think it's possible to put all files in the SCM for every project. Linux-based projects that use OSS tools probably stand the best chance of reaching this goal. However, I believe getting as close as is practical is a valuable endeavor. It enables a sense of security that you can, at any moment, go back in time and recreate older versions of products and that everyone is working from a single source of tools. In my view, these benefits alone outweigh the hassles that the extra discipline entails.

— Andrew Binstock
Editor in Chief
alb@drdobbs.com
Twitter: platypusguy


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-38309d0f877d7d0364504ca525123996
2013-11-06T15:34:50

It's interesting to note the rise of tools such as RVM (Ruby Version Manager) to deal with some of the complexities of large numbers of dependencies on different versions of third party tools/libraries/gems.

The same issues have been discovered with Go - if you just import a library from github directly, you leave yourself open to breakage from a new release.

I generally like to have my libraries and other things versioned locally...


Permalink
ubm_techweb_disqus_sso_-38309d0f877d7d0364504ca525123996
2013-11-06T15:29:05

It all depends on the sort of software you are involved with writing or supporting. Your development/versioning strategy needs to support your requirements.

I have seen quite a few situations where people are having to support systems many years old. It is easy to run into problems with lack of availability of old versions of compilers or other tools, or also things like licenses for such tools. The VM type solution is a good fall back for this - though you need to be sure that whatever version of VM you have saved can still be loaded and run with the latest version of the VM software - not always straight forward as the years go by.

It's not that everyone needs their own copy of all versions of all tools, but that somewhere the organisation needs a copy. A Bill of Materials referring to a dusty DVD in some fire safe may not be helpful after an office move! Life is simpler if you have it electronically (and only one copy is needed). This is indeed one of the challenges of the Git model, although there are approaches/work-arounds.


Permalink
ubm_techweb_disqus_sso_-14a202704c1cc2157b09b676aa33cca4
2013-09-25T14:12:08

I read the comments left by others.
Committing the compiled binary results of any code to version control is a bad idea, as the article suggests. And working on code built from dependencies directly from VC is also a bad idea.
What is a good idea is to keep all working code in VC, as the article suggests. But then publish versions to a build and dependency management system like Maven.
Vendor provided binaries should just go directly to Maven repositories, and bypass VC.
All code using dependencies should draw the dependencies from Maven, and not from VC.
If you put the code for a working application as well as all of its external dependencies in VC, you will have a difficult time managing the software system. Larger OSS projects all use something like Maven (usually maven) to manage dependencies internal to the project as well as from external projects.


Permalink
ubm_techweb_disqus_sso_-8969ca068c68f6cb1869c4bd6fe1996b
2013-09-08T00:44:27

I don't really agree, esp. regarding open source. No one really likes to look back, sir, because everything is free. The normal recommendation in the open-source world when facing a bug is reproduce it on the most recent version and fix it in the HEAD. So it does not make sense at all for an open-source project to version-control the tools. And many of them don't even deliver official binaries!

For Linux-based commercial projects, it is a little different. It probably makes sense to keep the environments/VMs that are used to create the binaries, but whether the VCS is the right place is a big question. I don't feel I like this idea. I normally just put in VCS a text file that documents the build environment in detail so that new developers can set it up. If they can do it, it probably shows it is enough. Not everyone will want to build things inside a VM, esp. considering the extra performance/storage requirements.


Permalink
ubm_techweb_disqus_sso_-ad320612bc69a7a157bfe980ec01b152
2013-09-07T14:41:38

While a pom.xml or build.sbt file should be enough to track dependencies, I've experienced in practice where the repository hosting an artifact would go offline or no longer support it. While I'd lean toward a local repository as a solution, this would be at least as safe. However, Git is unforgiving to large binaries, and you can't easily remove them from your repository without some finesse. I don't like the idea of people having to download a couple gigs worth of binaries/tools/etc when cloning my project. Perhaps there can be separate "support" repositories, that you can choose to clone if you're having issues with finding the right compiler or your pom file isn't working?


Permalink
ubm_techweb_disqus_sso_-656f64b26b78cb3d3703f58f7f7de7b6
2013-09-05T23:20:38

If you're going to these lengths to store things like compilers and libraries, you are probably using the wrong tool. My previous employer, Sony Online Entertainment, might just be the reason Perforce became good at storing binary assets and large projects. You see, when SOE deploys a new game, they do it by extracting a labelled version from Perforce and pushing it into the content distribution network (CDN) providers.

While this may not impress you, consider that the initial push for the release version of DC Universe Online was 17GB. Thousands and thousands of art assets, small to medium-sized, libraries, loadable modules, all of it versioned and controlled in Perforce. The P4 servers are the most powerful machines in the company, right up there with the financial database servers.

So yeah, you might not want to store all those revisions of your compiler and shared libraries in git, but you might want to pick a better tool for doing that than just making notes (that will inevitably get lost) as to where they are stored on some server (that will get shutdown and discarded without your knowledge someday).

While you're at it, be sure to purge the old assets when you flush older versions of your software out of maintenance. That will help make the decision to end maintenance on a release version; when management comes along and says "we need to fix this one last bug for this 10 year out of date customer" you can tell them "sorry, we just don't have the tools anymore, they were all deleted when it went off maintenance 2 years ago."


Permalink
ubm_techweb_disqus_sso_-0fe49c2e2302af94ad3ff06b271a3eeb
2013-09-04T16:43:22

Thanks for the nice reply, Andrew!

I totally agree with your comments. I would rather store a complete set of inputs than count on some outside system to (a) be available and (b) always be able to provide the version we depend on.

However, one note: NuGet.org actually does keep prior versions available, although the "Manage" dialog the PowerShell console interface do their best to obscure that fact.

NuGet.org's policy is that once a version of package is made available, it doesn't go away. Assuming, of course, that NuGet.org itself never goes away.


Permalink
ubm_techweb_disqus_sso_-826ad3d60cb32366c17900d34c93db85
2013-09-04T15:08:16

I guess one nice aspect of putting absolutely everything under version control is that if the decision leads to a disappointing state of affairs, you can always back it out.


Permalink
ubm_techweb_disqus_sso_-ad9041a796fec90072c4430be7b516f3
2013-09-04T09:14:56

There is a difference between stuff being versioned and having it stored in a source-code focussed VCS.

I agree that things like compilers don't store well in a source VCS. However there is a simple solution. Store a manifest in the VCS that says which version of the OS, RDBMS, language, web-server and so one that this version of the code depends on.

At deployment time your deployment script inspects the manifest, compares the local versions with the required versions and installs any that don't match. I have used this strategy very successfully.

In our project the compilers, RDBMS and so on were simply stored in a directory on a server accessible from wherever we were deploying code. Our deployment scripts examined the manifest and then copied any dependencies that weren't already in-place from the relevant directory. This is as easy as maintaining a simple directory naming convention that is based on the versions specified in the manifest entires.

Using this approach upgrades to these more infrastructural components was as simple as adding the new version of say the JDK to the directory structure and then committing a new version of the manifest referencing the new JDK version. If tests failed we knew that we had problems with the new version, if the tests passed we were ready to deploy into production.

I think that the key is to think of all of the code needed in production as a configuration set. This set of things doesn't necessarily need to be stored and maintained in a homogeneous store - different aspects of this stuff have different requirements. The deployment script can be as clever as you like in pulling dependencies from different sources. However the configuration set does need to be fully defined in a machine-readable fashion - there needs to be a structure of keys that tie things together so that given the commit of a release candidate you can find everything that that candidate needs - including passwords, operating systems, and so on.


Permalink
ubm_techweb_disqus_sso_-b6b16bb36dfdc87aed8cc9badffe7d9d
2013-09-04T08:24:35

I completely agree with the article. On additional note, the more organized your build system is, the easier it is to version control. For instance, we use several external libraries (as shared libraries). It is not advisable to push them into the development repo as blobs for different reasons. Instead the external libraries can be pushed into a repo of its own, where its versions can be clearly tracked. The actual (product) development repo can refer to this using the build system.

http://0-f.blogspot.com/2011/0...

Same goes to the design documents, diagrams as well. Using tools such as pencil is a convenient way and a clear way to version control even the design. I would prefer using TeX to write the technical documents instead of MS word. But its a learning curve to be precise and its difficult to convince everyone to use TeX for design docs.


Permalink

2013-09-04T08:23:54

I completely agree with the article. On additional note, the more
organized your build system is, the easier it is to version control. For
instance, we use several external libraries (as shared libraries). It
is not advisable to push them into the development repo as blobs for
different reasons. Instead the external libraries can be pushed into a
repo of its own, where its versions can be clearly tracked. The actual
(product) development repo can refer to this using the build system.

http://0-f.blogspot.com/2011/0...

Same
goes to the design documents, diagrams as well. Using tools such as
pencil is a convenient way and a clear way to version control even the
design. I would prefer using TeX to write the technical documents
instead of MS word. Its a useful learning curve to be precise. But it is
difficult to convince everyone to use TeX for documentation.


Permalink
AndrewBinstock
2013-09-04T03:54:17

Thanks for your note. NuGet works well for current builds, but not as well for earlier builds. Those builds require that the libraries and components that NuGet would download are still available. To guarantee their availability, it is safer IMO to check them in to the VCS.


Permalink
ubm_techweb_disqus_sso_-0fe49c2e2302af94ad3ff06b271a3eeb
2013-09-04T01:22:25

I faced, and lost, on a somewhat different aspect. Like many projects nowadays, we employ a variety of third-party libraries. As long as the VCS is up to the task, I prefer including those libraries in version control, so that when you get the project to build, you're getting everything it depends on. (Note: We use Perforce, which handles binaries perfectly well, and have enough space on its server to hold the third-party libraries.)

The "modern" approach, at least in the Microsoft world, is that this is a bad idea, and instead you should have NuGet automatically pull down the third-party libraries that you depend on.

The NuGet approach is certainly working, so I shouldn't complain. But, being a dinosaur, I have not come around to liking it better.


Permalink
ubm_techweb_disqus_sso_-ccb0938ee08648c3676059bf8bafe950
2013-09-03T19:59:44

In my experience, tools (and third-party software in general) shouldn't be put in version control unless you're making significant, regular changes to this software. It's generally not practical for everyone working on a large project to have their own multiple copies of the entire compiler tool-chain, for example. What tends to work perfectly fine is to have the build system reference these third-party packages via a path name that includes the specific version of the tool being used. For example, the makefiles might reference /opt/boost/boost_1_54_2.


Permalink

Video