Why is refactoring interesting or valuable? There are at least three reasons.
- A series of small refactorings can produce significant improvements to existing code. While each individual refactoring is minor, applying a dozen refactorings, each on top of the others, can result in a much improved software design. The improvement can transform confusing, poorly-written software into well-designed and readable code.
- Refactoring lightens the load on the design phase of software development. The standard advice to spend lots of time on design, before writing any code, was based on the assumption that it is hard to fix a bad design later in the programming process. But refactoring allows us to worry less about the up-front design phase, and begin programming sooner. (This is not an argument to spend no time on design, or to proceed with a lousy design, but refactoring makes design errors easier to fix later.)
- Refactoring leads to a more realistic design phase. The goal of producing an excellent design up front, that is correct throughout the life of the software, is certainly well-intentioned. It would be great if humans were capable of doing this. But the reality is we are not. We cannot anticipate design bugs that may arise or the changes users will request once they begin interacting with the software. Refactoring allows us to say, "We will do the best job we can now with design and architecture, but we recognize these macro views of the software may need to change later."
The process of looking at software source code, finding ways to improve it, and then making those changes has been around for as long as people have created software. Simple "debugging" often involves a structural change to the code, not just a one-line fix, and any positive structural change is essentially refactoring. As a separate topic of study, however, refactoring got its start in the early 1990s with Refactoring Object-Oriented Frameworks and PhD thesis by Bill Opdyke. In 1999, Martin Fowler wrote Refactoring: Improving the Design of Existing Code, the de-facto standard book on the subject. Refactoring is now an accepted software engineering discipline, included in many of the field's conferences and journals. For readers new to the topic, Wikipedia contains a good summary and list of references.
Any discussion of refactoring should mention the crucial role of automated tools. The reason tools are key is that applying even a single, straightforward source code transformation to a large program is a tedious, error-prone task. Imagine applying the common transformation Encapsulate Field across a large code base. You have to find all read-only references to the (previously) public field, replace those with a Get() method call, then find all write-references to the field, and replace those with Set() calls. For a field with a thousand references, this is a pain in the neck, unless there is a tool to help. For a more complex change, such as Move Method (between classes), the task is even more daunting. No one will refactor anything if programmers hate doing it. There has been considerable work in this area, both theoretical and practical. As examples, both the Eclipse Java and NetBeans IDEs include automated support for some common refactoring transformations.
I must confess that, when I first learned about refactoring, I thought it was a trivial re-statement of techniques experienced programmers understand implicitly. I was shocked anyone could receive a PhD degree for describing this knowledge. It is true that much of refactoring is obvious, in a general sense, to good programmers. But I did not understand the significant impact on software design and development that comes from a disciplined approach to refactoring. (And I realized Opdyke's dissertation is more than just a description.)
Despite the successes of the refactoring movement, and its track record helping programmers on real-world projects, it has some glaring open problems.
- How does a programmer know when to refactor? Software developers spend many hours looking at thousands of lines of code. On good projects, most of the code is decent and well-designed. So what parts should be refactored? The available automated tools generally do not help with this question. Most tools assist by correctly doing a transformation, after the programmer has decided to do it. But how can a programmer find code to refactor?
- Which refactoring should be applied in a given situation? There are at least 70 standard refactorings for object-oriented software. For a specific section of source code, only some of these 70 make sense of course, but of the transformation that could be applied, which one is correct?
- Why is refactoring an improvement to software? It cannot be the case that each individual transformation is an improvement, because many transformations contradict each other. For example, Extract Class and Inline Class perform the exact opposite changes. If each always improved software, we could just apply the two transformations to the same class, in an endless cycle, improving software forever. Instead, every refactoring transformation improves software sometimes, in the right situation. But why?
Taking some liberties to summarize thousands of pages of research, the current answers to these questions are:
- A section of source code should be refactored when it "smells bad." There are about 20 known bad smells.
- For each bad smell, we should apply one of the refactorings that tend to help with this malodor.
- No one knows.
For a better answer to these questions, we need a theory of refactoring -- which explains what refactoring is, why some code smells bad, and why refactoring makes software better. It just so happens I have such a theory.
In 2002 I wrote an essay Most Software Stinks that presents an overall theory of software design. I argued there are seven general principles of good software design. The principles describe properties of the software itself, not ways of creating software. It does not matter if software is created with Extreme Programming, CMMI Level 5, Java, or COBOL. The principles are universal properties of all good software. They are:
- Cooperation. Software should work well with its surrounding environment, which is the computer hardware, operating system, middleware (such as database and security layers) and applications.
- Appropriate form. The internal design of software should reflect and create its external behavior; form should follow function.
- System minimality. Software should be as small as it can be, by using other computing resources wherever possible. Software should contain just what it needs to, but no more.
- Component singularity. Good software contains one instance of each component, and makes that component work correctly. The opposite of singularity is redundancy, widely recognized as poor design.
- Functional locality. Source code should place related items together. This makes it easy to fix bugs and make changes, because programmers can quickly find the code they want. Functional locality implies levels of abstraction, and locality should be achieved at each level of description.
- Readability. There are two aspects to software readability: clarity that is built into the code, and comments that annotate the code. The first includes meaningful names for variables and constants, good use of white space and indenting, and transparent control structures. Good commenting educates the next programmer about the intention of each module.
- Simplicity. Software should do its work and solve its problems in the simplest manner possible. In many ways, simplicity is the most important principle of all and overlaps all the others. Simple programs have fewer bugs, run faster, are smaller, and are easier to fix when broken. Simple programs are dramatically less expensive to create and maintain for these reasons.