If you've been doing OO programming for a while, you've surely run into the seemingly endless essays on testability. The issue they debate focuses on how to write code to make it more amenable to automated testing. It's a vein that is particularly intriguing to exponents of test-driven development (TDD), who argue that if you write tests first, as in the orthodox approach to TDD, your code will be inherently testable.
In real life, however, this is not always how it happens. TDD developers frequently shift to the standard code-before-tests approach when hacking away at a complex problem or one in which testability is not easily attained. They then write tests after the fact to exercise the code; then modify the code to increase code coverage. There are good reasons why code can be hard to test, even for the most disciplined developers. A simple example is testing private methods; a more complex one is handling singletons. These are issues at the unit testing level. At higher levels, such as UAT, a host of tools help provide testability. Those products, however, tend to focus principally on the GUI aspect (and startlingly few of those handle GUIs on mobile devices). In other areas, such as document creation, there is no software that provides automated UAT-level validation because parsing and recreating the content of a report or output document is often an insuperable task.
I don't want to get off my main point, however, which is that what makes code untestable is frequently not anything I've touched on so far, but rather excessive complexity. High levels of complexity, generally measured with the suboptimal cyclomatic complexity measure (CCR), is what the agile folks correctly term a "code smell." Intricate code doesn't smell right. According to numerous studies, it generally contains a higher number of defects and it's hard sometimes impossible to maintain. Fortunately, there are many techniques available to the modern programmer to reduce complexity. One could argue that Martin Fowler's masterpiece, Refactoring, is almost entirely dedicated to this topic. (Michael Feathers' Working Effectively With Legacy Code is the equivalent tome for the poor schlemiels who are handed a high-CCR codebase and told to fix it.)
My question, though, is how to avoid creating complexity in the first place? This topic too has been richly mined by agile trainers, who offer the same basic advice: Follow the Open-Closed principle, obey the Hollywood principle, use the full panoply of design patterns, and so on. All of this is good advice; but ultimately, it doesn't cut it. When you're deep into a problem such as parsing text or writing business logic for a process that calls on many feeder processes, you don't think about Liskov Substitution or the Open-Closed principle. Typically, you write the code that works and you change it minimally once it passes the essential tests. In other words, as you're writing the code there is little to tell you, "Whoa! You're doing it wrong."
For that, you need another measure, one which I've found to be extraordinarily effective in reducing initial complexity and greatly expanding testability: class size. Small classes are much easier to understand and to test.
If small size is an objective, then the immediate next question is, "How small?" Jeff Bay, who contributed a brilliant essay entitled "Object Calisthenics" (in the book The Thoughtworks Anthology) that touches on this topic, suggests the number should be in the 50-60 line range. Essentially, what fits on one screen.
Most developers, endowed as we are with the belief that our craft does not and should not be constrained to hard numerical limits, will scoff at this number (or at any number of lines) and will surely conjure up an example that is irreducible to such a small size. Let them enjoy their big classes. But I suspect they are wrong about the irreducibility.
I have lately been doing a complete rewrite of some major packages in a project I contribute to. These are packages that were written in part by a contributor whose style I never got the hang of. Now that he's moved on, I want to understand what he wrote and convert it to a development style that looks familiar to me and is more or less consistent with the rest of the project. Since I was dealing with lots of large classes, I decided this would be a good time to hew closely to Bay's guideline. At first, predictably, it felt like a silly straitjacket. But I persevered, and things began to change under my feet. Here is what was different:
Big classes became collections of small classes. I began to group these classes in a natural way at the package level. My packages became a lot "bushier." I also found that I spent more time in managing the package tree, but this grouping feels more natural. Previously, packages were broken up at a coarse-grained level that dealt with major program components and they were rarely more than two or three levels deep. Now, their structure is deeper and wider and is a useful roadmap to the project.
Testability jumped dramatically. By breaking down complex classes into their organic parts and then reducing those parts to the minimum number of lines, each class did one small thing that I could test. The top level class, which replaced its complex forebear, became a sort of main line that simply coordinated the actions of multiple subordinate classes. This top class generally was best tested at the UAT level, rather than with unit tests.
The single-responsibility principle (SRP), which states that each class should do only one thing, became the natural result of the new code, rather than a maxim that needed to apply consciously.
And finally, I have enjoyed an advantage foretold by Bay in his essay: I can see the entire class in the IDE without having to scroll. Dropping in to look at something is now quick. If I use the IDE to search, the correct hit is easy to locate, because the package structure leads me directly to the right class. In sum, everything is more readable; and on a conceptual level, everything is more manageable.
Making Large Classes Small (In 5 Not-So-Easy Steps)
I've discussed the benefits and other effects on code bases of using small classes, which I defined using a limit of 50-60 lines. Note that I'm not discussing a single function, but rather an entire class, which implies multiple functions in most cases. Coding classes as diminutive as 60 lines struck other developers as simply too much of a constraint and not worth the effort.
But it's precisely the discipline that this number of lines imposes that creates the very clarity that's so desirable in the resulting code. The belief that this discipline cannot be consistently maintained suggests that the standard techniques for keeping classes small are not as widely known as I would have expected. (Given that this article was inspired by an extended effort to clean up a project that contains much of my own code, I say this with all due humility.)
Let's go over the principal techniques. I presume in this discussion that design has been done and it's now just a matter of writing the code. Or in the less attractive case, of maintaining code.
Diminish the workload. The first technique to apply is the single responsibility principle (SRP), which states that classes should do only one thing. How big that one thing is will determine in large part how big your classes are going to be. Reduce the work of each class; then, use other classes to marshal these smaller classes correctly.
Avoid primitive obsession. This obsession refers to the temptation to use collections in their raw form. This is definitely a code smell. If you have a linked list of objects, that linked list should be in its own class, with a descriptive name. Expose only the methods that the other classes need. This prevents other classes from performing operations without your knowledge on an object they don't own. The purpose of the list is also supremely clear and this encapsulation enables you to change easily to a different data structure if the need should arise later on.
Reduce the number of class and instance variables. A profusion of instance variables is a code smell. It strongly suggests that the class is doing more than one thing. It also makes it very difficult for subsequent developers to figure out what the class does. Very often, some subset of the variables form a natural grouping. Group them into a new class. And move the operations that manipulate them directly into that class.
Subclass special-case logic. If you have a class that includes rarely used logic, determine whether that logic can be moved to a subclass or even to another class entirely. The classic example of the benefits of object orientation is polymorphism. Use it to handle special variants.
Don't repeat yourself (DRY). This suggestion appears pointlessly obvious. However, even coders who are attentive to this rule will repeat code in two methods that differ only in a single detail. In addition, they can overlook the introduction of duplicate code during maintenance. More than the other guidelines here, which are all techniques, DRY is a discipline within a discipline.
Taken together, these tools get you most of the way to small classes. To see how they are implemented in real life, I once again suggest Fowler's Refactoring, which is essentially a cookbook of techniques for cleaning up code.
Returning back to my own experience, I am finding that as I insist on this particular discipline in my code rework, my brain is slowly developing a "muscle memory" and is beginning to think automatically about class size prior to class development and certainly during the cleanup of existing code. Cheers!