When we're developing software, we constantly have to make decisions. Sometimes I think that if I had to define design, I'd say that design is the process of making decisions in the face of constraints. There are no problems that don't have any constraints, but one of things that makes software so successful is that it is relatively "constraint-free" as a medium. When we write code, we might have some size limitations or space limitations, but beyond that, there are no immediate costs aside from the cost of writing the code. All of the costs come later, when someone has to understand the code or modify it.
In the example we've been looking at, there are more than two costs involved. We have the cost of factoring out an
AccountServices class and the cost to move toward our ideal design, but we also have the cost of keeping the code in its current state. Most of the time, we ignore that cost and make refactoring decisions based upon our understanding of the best alternative in this case, moving toward our ideal design. We either decide to refactor toward the ideal or we leave the code alone. In this case, we should consider whether moving to
AccountServices puts us in a better situation than we would be in if we left the code alone. Unfortunately, there is no clear answer; but in some cases, refactorings that leave the code in a slightly less clear state might be acceptable even if those refactorings are not our first choice.
Trusting Discovered Clustering
So far, we've been making the assumption that the encapsulation boundaries in our code may not be meaningful and that they may result in incohesive Frankenstein-ish classes like
AccountServices. How can this happen? Sometimes, code for different responsibilities becomes deeply intertwined. This often happens when we notice that we can do two things at roughly the same time because we are using the same data. A common example is doing two widely different things in one loop. We may do this simply because we know that each of the things we have to do requires iteration over the same collection. In these cases, a method containing a loop will have a dependency on a collection, but we don't see the responsibilities as separate nodes in the graph. The alternative, however, is rare. Separate methods often denote separate responsibilities. When we can't find a clear name for a cluster that we want to extract, it is often because its responsibilities are mingled within methods. We can refactor to separate them, but often, such work is intricate.
Choosing Structure over Naming
There are many qualities that can make designs better. One that really helps names that have clear meaning in the domain. We can use those names to help us determine the relationship between different parts of our code. Beyond that, we can fall back on the classical ideas of coupling and cohesion a design is good when it maximizes cohesion and minimizes coupling. Although the justification is rarely noted, it's likely that the concepts of coupling and cohesion are useful to us because they are cognitive aids for understanding unfamiliar things . We can understand only so much at a time, so the pieces we attempt to understand should be small and their dependencies should be clear and few. We should be able to understand a chunk of code by itself. Good naming helps us by giving us a mental handle for the other chunks of code that a particular chunk depends upon. It helps us build up a web of understanding.
There are times, however, when our code has internal clustering in its graph and we can't conceive of good names for the clusters. If that is the case, is it acceptable to choose a poor name and move forward with class extraction? Aren't we making our code a little worse if we do that? Again, everyone has to make their own judgment on a situation-by-situation basis, but it's worth considering that naming is only one aspect of program understanding. Code, in general, becomes more understandable when we are looking at chunks that don't have many dependencies on externals. When we can look at a piece and feel that we can understand it locally, we are more likely to be able to change it correctly. The fact that we don't have a good name for it is regrettable, but there is always the chance that we might find a good name later, or that we might eventually merge the the extracted class back into its original class and find a better to way to refactor. In the meantime, we get the benefit of local understanding. If that understanding is constructive relative to the original state of the code, the refactoring might be worthwhile.
1. Mullen, Thomas. Writing Code For Other People: Cognitive Psychology and the Fundamentals of Good Software Design Principles[PDF], OOPSLA '09, Proceeding of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages, and Applications.
Michael Feathers is Chief Scientist at Obtiva Corporation.