A Build System for Complex Projects: Part 1
A Build System for Complex Projects: Part 2
A Build System for Complex Projects: Part 3
A Build System for Complex Projects: Part 4
A Build System for Complex Projects: Part 5
Build systems are often a messy set of scripts and configuration files that let you build, test, package, deliver, and install your code. As a developer, you either love or loathe build systems. In this article series, I present a different approach to build systems, with the ultimate goal of completely hiding the build system from developers. But first, let me start with some personal history.
Early in my programming career I was a pure Windows developer (with the exception of my very first job, where I wrote Cobol programs for publishing Australia's Yellow Pages). While there was no build system to speak of, there was Visual Studio and Visual SourceSafe. I built Windows GUI clients, messed around with COM components, and picked up some nice C++ template tricks from ATL. And because automated unit testing wasn't very common back then, we created various test programs before passing code on to QA. This wasn't too painful since I worked for a small startup company and the projects weren't too big.
But I then moved to a company that developed software for chip fabrication equipment in the semi-conductor industry and BOOM! Life-critical and mission-critical real-time software running on six computers that controlled custom-built hardware in clean-room conditions. The software ran on several operating systems with about 50 developers contributing code. The development environment consisted of two machines running Linux and Windows/Cygwin. The deployment environment was Solaris and LynxOS RTOS. No more Visual Studio. After reading about 1000 pages of documentation in the first week and getting my .profile and .bashrc in order, I was assigned my first task -- designing and implementing a build system to replace the existing one, which was a nasty combination of Makefiles and Perl scripts that actually worked but nobody was sure why (the original author had left the building). There were a few bugs (for example, the build system didn't always follow the proper dependency path) and a big requirements document. Clearly it would be impossible to evolve the current build system, so I had to create a new one from scratch. This was lucky because I had zero experience with Makefiles and Perl, coupled with the tolerance threshold of a Windows developer to gnarly stuff. I still have the same tolerance, but I now know something about Makefiles.
Some of the requirements were pretty unusual, like running a commercial code generator that produces code from UML diagrams on a Windows machine, then uses the artifacts to compile code on Linux, Solaris, and LynxOS. The bottom line is that I decided to take an unusual approach and wrote the entire system in Python. It was my first big Python project and I was really surprised at how well it went. I managed everything in Python. I directly invoked the compiler and linker on each platform, then the test programs, and finally a few other steps. For instance, I implemented friendly error messages that provided helpful suggestions for common errors (e.g., "FrobNex file not found. Did you remember to configure the FrobNex factory to save the file?").
While I was generally pleased with the system, it wasn't completely satisfactory. In lieu of Makefiles, I created build.xml files, a la Ant. That was a mistake. The XML files were verbose compared to Makefiles, big chunks were identical for many subprojects, and people had to learn the format (which was simple, but something new). I wrote a script that migrated Makefiles to build.xml files, but it just increased code bloat. I created a custom build system without regard for the specific environment and its needs. I created a very generic system, with polymorphic tools that can do anything as long as you write the code for the tool and configure it properly. This was bad. Whenever someone says, "You just have to ..." I know I'm in trouble. What I took away from this experience is that Python is a terrific language. It's really fun when you can actually debug the build system itself. Having full control over the build system is great, too.
Background: What Does a Build System Do?
The build system is the software development engine. Software development is a complex activity that involves tasks such as: source-code control, code generation, automated source code checks, documentation generation, compilation, linking, unit testing, integration testing, packaging, creating binary releases, source-code releases, deployment, and reports. That said, software development usually boils down to four main phases:
- Developers write source code and content (graphics, templates, text, etc.)
- The source artifacts are transformed to end products (binary executables, web sites, installers, generated documents)
- The end products are tested
- The end products are deployed or distributed
A good automated build system can take care of steps 2-4. The distribution/deployment phase is usually to a local repository or a staging area. You will probably need some amount of human testing before actually releasing the code to production. The build system can also help with that by notifying users about interesting events, such as successful and/or failed builds and providing debugging support.
But really, who cares about all this stuff? Actually everybody -- developers, administrators, QA, managers, and even users. The developers interact most closely with the build system because every change a developer makes must trigger at least a partial build. When I say "developer" I don't necessarily mean a software engineer. I could be referring to a graphic artist, technical writer, or any other person that creates source content. When a build fails, it's most often because a developer changed something that broke the build. On rare occasions, it would be an administrator action (e.g., changing the URL of a staging server or shutting down some test server) or a hardware problem (e.g., source control server is down). A good build system saves time by automating tedious and error-prone activities.
Think about a developer manually building and unit testing a program. Without a build system, he has to very carefully build it properly, test it, and hand it over to QA. The QA person needs to run his own tests, then hand it to the administrator for deployment to a staging site, where more tests are run against the deployed system. If anything goes wrong in this process, someone must determine what happened. Automated build systems eliminate a whole class of errors. They never forget a step and they can pinpoint and resolve other errors by verifying that the source artifacts and intermediate artifacts are available and by scanning through log files and detecting failures.
Managers can also benefit from build systems. A passing build is the pulse of a project. If you have an automated build system with good test coverage (at the system level), managers can monitor project progress and be ready to release at each point. This in turn enables more agile development practices (if you are so inclined).
A build system can even help users in some cases. Think about systems that incorporate user-generated content and/or plug-ins. In most cases, you need to go over the content and ensure it doesn't break your system. A build system that automates some/all of these checks allows for shorter publish/release cycles for user-generated content.