The frequency of check-ins reveals a lot about your way of working. When I'm coding, I check-in very frequently. By "very frequently" I mean sometimes I'm checking-in every time I get something done, and this can be a very small refactor that only took two minutes. This approach means that I create an enormous collection of check-ins.
Sometimes I check-in code that it is not even complete or not even working. I do that in order to "explain what I'm doing." I can create as many check-ins as I need without breaking the build or bothering my teammates because I'm using a separate branch. I create a branch for each entry in our issue-tracking system and then I check-in as often as I need.
My check-ins look something like Figure 1.
Figure 1.
Mine would be 12
, 13
, 14
, and 15
.
I really prefer mapping branches to individual tasks for several reasons, but one of them is to create a clean and simple link between tasks and code. It is also very easy to review task001
: You just diff the branch.
Now let me explain why I don't like using a single working branch as a shared place to check-in.
Shared Working Branch?
If the entire team uses the same branch to check-in, chances are that a task will look like Figure 2:
Figure 2.
Unless you're lucky enough to finish all your work on a single check-in, your task likely will be split into several check-ins, which are sprinkled in the middle of other unrelated check-ins. This makes code review much harder because you need to diff each changeset. When working this way, it is common to end up setting mechanisms to send the changes to be code reviewed prior to being checked in…while it would be much easier to just code review the branch and only get it merged when it is correctly reviewed. I say "mechanisms" because I've seen tools creating temporary patches, pushing to different repositories, and all sorts of overly complicated solutions to avoid creating a "task branch" and just let things flow.
Branches as Units of Change
Hence, I strongly prefer using branches as units of change (one branch for each task), instead of individually linking each changeset as part of a change (task). At the end of the day, the branch acts simply and efficiently as an organized changeset container.
Nowadays, continuous delivery proponents tend to default to "avoid branches at all costs," especially when describing excellent results for huge teams (Facebook- or Google-sized), but I still think branching is an extraordinarily good tool for most teams out there (in fact, also for huge ones, but I'll discuss that in a later article).
Tell a Story
Returning to the "branch per task" approach: How often do you check-in and why do you check-in? Suppose you have to fix a bug, but decide to rearrange the code before applying the fix, so the fix ends up being trivial. There are several ways to do the job: One is to do all changes and then check-in. I'll call it a "big bang check-in." This is a common pattern for people not accustomed to working on their own task branch because this is what you tend to do in a shared branch (see Figure 3).
Figure 3.
Suppose the change actually involves the following:
- Split the class
Storage
into two:Storage
andFileStorage
. - Create a new class for
FileStorage
(FileStorage.cs
). - Rename all references to the old
Storage
modifying 15 files. - Adapt the unit tests (modifying another 4 files).
- Then fix the bug in
FileStorage.Write()
.
The developer doing the code review will diff the branch (or the changeset) and see a list of 20 modified files plus a new file.
He will take some time looking into the code before really understanding that most of the changes are just related to the refactor, and the bug fix actually takes only a few lines inside the new FileStorage.Write()
method.
Telling a Story Check-in After Check-in
Now, suppose the developer decides to explain the story of the bug fix check-in after check-in as in Figure 4.
Figure 4.
The reviewer can quickly check the comments of the changesets in the branch task001
and find out that the real change is in cs:15
. He might even take a look at cs:15
to check if the change is OK before actually going back and review the rest of the changes. Also, simply "replaying" what the developer did changeset after changeset will be much easier to understand than dealing with the entire diff.
It may sound counter-intuitive because one of the advantages of task branches over "shared working branches" is the ability to group the changes and avoid going "cset by cset" to see changes, and now I'm proposing exactly the opposite: go cset by cset. But it is important to note that the type of check-ins you can do with task branches are not like the many check-ins you do on a shared branch. With task branches, you can split your task into steps, explaining what you've done step-by-step, something unlikely to be done when you deliver to a shared branch.
Big Reviews Put You Off
Additionally, the whole point of "telling a story with check-ins" is to help the reviewer when facing the code review. If he sees a big bunch of files to be reviewed, he'll instantly think, "Oh, wow, that's big. It is going to take a while to complete," which is really discouraging.
On the other hand, if he knows that the real change is not that big because he can quickly glance through the individual changes and understand how the developer proceeded, chances are he'll be in a better frame of mind to get the task reviewed, even if he's reviewing the same number of lines of code. It is all about better communicating the task and helping simple things look simple.
Positive Side Effects of Changeset Storytelling
If you end up working this way, you'll be using your version control not only to track what has been done but also how it was done.
Capturing how changes are actually done has several advantages:
- It will help the reviewer better understanding the changes.
- You can use the task branch to teach new team members how certain operations are performed or how more-experienced developers work. At the end of the day, you'll be capturing a sort of "task recording" of the steps performed to complete it.
Pablo Santos is a blogger for Dr. Dobb's and an expert on the operations of version control systems.