Channels ▼


The Misplaced Obsession with Simplicity

Like so many aspects of modern life, in programming good is defined in absolute terms and bad is defined as the failure to live up to the standards established by good. In that great storehouse of all that is good in programming, perhaps no quality receives more praise than simplicity. Reading quotations from famous programmers about the pursuit of simplicity is like reading about the quest for the Grail. We're told that simplicity is always hard work. That inside every complex bit of code, there's a simpler version waiting to be set free. That simplicity is the hallmark of great minds. And so on.

More Insights

White Papers

More >>


More >>


More >>

Predictably, complex code is regularly savaged: It's the telling symptom of laziness, lack of skill, or lack of discipline. It's a code smell. Any idiot can write complex code, the true art is writing simple code.

I need not continue. Everyone has heard, thought, and even embraced these platitudes to some extent. But in fact, I believe, they are mostly wrong. And what bothers me is that they invariably go unchallenged. Listeners nod in assent and move on to the next thing.

Let's tease this out a little. It's not true that any idiot can write complex code. Complex code is difficult, often very difficult, to write. It's entirely true that it's more difficult to maintain, too. But that's the nature of complexity. Some things are intensely difficult to express in code and they require complexity, simply because they're not inherently simple.

Perhaps you've been led to believe that complexity is the product of insufficient skill. A real-life example should lay that theory to rest. For grins, let's take one of the best-known programmers in  parser theory (Al Aho, coauthor of the Dragon book), team him with Brian Kernighan (the K in K&R), and have him write a DSL that becomes the very model of programming elegance (awk). If skill were all that were needed to write simple code, then you'd expect Aho to deliver the simplest parser code imaginable — clean and elegant as K&R's examples of C. Now, the reality, as written by Aho himself: "I had incorporated some sophisticated regular expression pattern-matching technology into AWK Brian Kernighan once took a look at the pattern-matching module that I had written and his only addition was putting a comment, 'Abandon all hope, ye who enter here.' As a consequence, neither Kernighan nor Weinberger would touch that part of the code. I was the one who always had to make the bug fixes to that module" (from Masterminds of Programming, p. 103). Complex problems require complex code.

There's a big difference between poorly written code and complexity. Unfortunately, parts of the Agile movement have tended to obfuscate the distinction. A rule of thumb I've seen cited several times is that functions with a cyclomatic complexity number (CCN) of more than 30 or 35 must be rewritten. This is patent nonsense and implies that all complex code is equivalent to badly written code. Moreover, there's a peripheral problem with the assertion; namely, that every branch of a switch statement adds 1 to the CCN. So, if your switch has 35 branches, you violate the threshold with no reasonable way to simplify your code. (Sure, you could use some kind of table instead of the switch, but now you've taken logic that was easy to read and made it considerably more difficult to follow.)

This brings me to the point that is truly the heart of the matter: It's not simplicity that matters, but readability. Can the code be understood with the least amount of effort given its inherent complexity? That is the true criterion. How readable, or if you prefer, how maintainable is the code with respect to its complexity?

As to the obsessive reverence for simplicity, it is to me quixotic and sentimental. If we look at nature, we find that absolutely nothing in it is simple. Everything in nature is characterized by extraordinary complexity that masks further layers of complexity. As to human creations such as the automobile, they induce people, like the mechanic at my garage, to remember with fondness days past when everything was "simpler." But in fact, in the last 15 years, the rates of unexpected mechanical failures in cars has plummeted and gas mileage has soared. One of the biggest reasons for these advances is the 5 to 10 million lines of code in cars today. Yes, indeed, vastly complex code has delivered significant benefits — and made the mechanic's repair work considerably simpler.

My view of simplicity is unemotional and free of idolatry because I define it with respect to complexity, rather than the other way around: Simplicity is the quality of code that is no more complex than required to express the underlying complexity. In this way, simple code can be intensely complex. There is no inherent good/bad dichotomy. 

— Andrew Binstock
Editor in Chief
Twitter: platypusguy

Related Reading

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.



I think you fail to make sufficient distinction between inherent and accidental complexity. Of course any given problem has some level of inherent complexity and you can't ignore that.

However it has been my experience that most software is over-complex for no good reason. It is rarely the inherent complexity of the problem that is at the root of the problem it is generally poor abstraction and poor separation of concerns.

So while I agree with you that any rule that says "You must rewrite any method with a cyclomatic complexity over 35" is suspect if treated as an absolute, you said yourself that it is a "rule of thumb" which means it is a rough guide not a law. That seems ok to me. Any method with a high cyclomatic complexity should be questioned, with the understanding that an acceptable answer is "In this case the method is ok becuase it is solving an inherently complex problem".


It is interesting how easy it is to fall prey to proven paths to success, quality, and what not. At the end of the day, money needs to be made and benefits need to outweigh the costs. Maintenance, simplicity, blah, blah, blah, are vitally important, but can also brew waste and failure in some contexts. This brings to mind several thousand lines of awk script at the root of a successful project. At the time, the software team was prone to being beaten up by being consistently late or slow, and the hardware guys somehow always came out smelling like roses (probably not unrelated to the fact that they could burn schedule on the front end and leave the dregs for software, but that's not really the point). Post-script, multiple variants of the product were produced such that software development was never the long pole in the tent (or at the mercy of the hardware guys burning up schedule). The script took a spreadsheet (or tabular specification of operational aspects of the product monitor and control system) and converted it to: source code for embedded processors, input to a GUI engine, and documentation that described use of the product. Most software changes involved working the spreadsheet, running the scripts, and doing a bit of code work if new core functionality was required, but often the new or changed functionality could be implemented as a spreadsheet data change. The awful complexity of the scripts and engines resulted in profit and success. Somewhere in the code is a comment "This is ancient lore. As such, it is not necessarily beautiful. Sorry. Be warned. ... ". It was placed there after numerous, sincere, failed, attempts to simplify and improve the implementation. When speaking on the topic, the author adopts a Mr. T tone of voice and begins with "I pity the foo..." and ends with something like "change/understand this code". The author is not proud of the script, but he could be justifiably proud of the achievements that resulted from it: an astonishingly tight synchronization between an embedded product, client application, and documentation that left the software guys smelling like roses. In hindsight, with knowledge and experience gained of the years, the author realized many ways the implementation lacked, and would not care to see the code subjected to analysis (as the results would be loud and resounding boos), but in the end... is not ashamed to have been a part of producing years of revenue by facilitating and software re-use to be ultimate benefit of the company and his peers in years following. Simplicity and readability was a goal, but when push came to shove, progress was required and money had to be made despite merciless schedule. It's not an excuse per se. The best work often requires Solomon's wisdom when it comes to pursuing the various Holy Grails of coding. Ugly, unmaintainable, complex, etc. aren't ideal... but may be good enough... or even a (gasp) necessary component of success and experience gain. There might indeed be a place for obsession, or compulsive preoccupation with ideals, but they can hardly be truthfully described universally mandatory and profitable. It could have gone badly, and maybe the risk wasn't justified - but then again, the worst never happened to that script, born over a decade ago, with a product that still manages to generate income now and then. I was the author, paradoxically apologetic and unapologetic at the same time.


"There is no inherent good/bad dichotomy."

Only if the complexity is derived from the problem domain. Otherwise, it's just an unnecessarily confusing mess.


In practice, "reading complexity" is rarely even approximately equal, and readability is only weakly correlated with length. What makes unjustifiably complex code hard to read are all those WTF things that the author added to make it unjustifiably complex.


I just spent 3 days cleaning up a program written by a colleague before I could start extending it. It was typical of a program developed by a self-taught C programmer: long winded, unclear, separated into functions along no clear boundaries, and implemented with simple data structures that did not at all reflect the actors in the story the code was attempting to tell. Most of the challenge was to see the data in all that code, create a few appropriate classes, and then beat the functions into class members. After dropping all the code he'd written that was already provided by libraries, this program is 1/3 it's original size in terms of source lines, about 20% it's original size in code size, and lost at least 4 obvious errors along the way.

On the gripping tentacle, I'll compare that mess to an embedded system I worked on 15 years ago. It had one central function, written as a single C function because that made it easier to lock the code into the small instruction cache on the CPU, that was 26,000 lines long. This function, which had a McCabe's number in the 330 range, was the best-crafted piece of code in the entire system, because it absolutely had to be. There were only 3 or 4 programmers in the company allowed to touch it and I was quite flabbergasted to be included on the list, although I was included only with Jeff watching very carefully over my shoulder.

Yes, the code complexity needs to match the complexity of the problem. Still, unnecessary complexity is to be avoided at all costs. Learning to see the simple program inside the spaghetti is not a sacred cow to be slaughtered lightly.


I understand your concern, but bad programmers will always be able to find something to justify their work, if they search long enough. Aho's code was indeed that grotty. In correspondence with the maintainer of gawk (GNU awk), to whom I sent the link to this editorial, he replied that the corresponding code in gawk is so complex, that's it's the one thing he just doesn't touch. Any fixes or patches are handled by the original developers of that code. Despite one of the contentions earlier in this comment stream, regex parsers are complex beasts.


Articles like this bug me, because they're the sort of article that bad programmers latch onto to justify why they continue to write incomprehensible crap.

Was Aho's regex code really so irreducibly complex as to justify writing code that incredibly smart guys like Kernighan and Weinberg would refuse to touch? Honestly, I don't think that there really are that many problems that are so irreducibly complex. Complex problems are a sum of simpler problems.


You spoke my heart!!! It is the truth that no one understands and makes you stand out as the dumb guy in the group.


Personally, I think that there's something more important than code complexity or simplicity. That's architecture, in the sense of mental models of what's trying to be computed. I spent days on a previous project, a debugger, trying to figure out how to get started on the > 2M SLOC code, when someone told me about a 10 page set of slides which contained the architecture from 5 years previous. With those slides I was able to get a sense of how the project was put together, and was able to make contributions almost immediately. Architecture, and especially architectural documentation, trumps simple code.


I think Mr. Binstock's point is that there are shops out there that give too much concern and effort towards a flawed metric. Think back to the structured programming days where "a procedure cannot exceed 50 lines (or 24 lines or ???)", and people wrote utilities that went through their source code and every 50 lines would replace it with a procedure called "DoStuff1", then the next 50 with "DoStuff2", then the next 50 with "DoStuff3", then DoStuff() just had calls to those 3 procedures. Same issue, only a different (and similarly flawed) metric.
Metrics, although quantitative, should be analyzed a little more qualitatively. "Yeah, but my non-programming manager can't tell the difference when it's "good" to violate that metric, but they can defiinitely know when this number is > 50!". I didn't say it was going to be easy. :-D


If these 35 (128, 1000) branches are "peers" (e.g. lexical tokens), then I would expect them to be in the same level, same switch (but I would also want some "order" to their layout within the switch).
However, if there is a hierarchy (e.g. nested state machines) all at the same level, then yes, it should be refactored at the proper "layer" (e.g. the various state machines are encapsulated within their own class/methods or functions).
Actually "layers" (think OSI comm model) should be applied in your design and your code. Just this one "abstraction" can simplify and make your design and code more understandable (again, applying the 7 +/- 2 rule).


I contend as I do because the basic task that regular expressions do is pretty straightforward (even if the grammar looks frightening). Any fool can mechanically translate a RE into some more comprehensible form (say, railroad diagrams), and any other fool could animate the diagram using test cases. It doesn't take much more intelligence to use the diagram to predict behaviour at a more abstract level. (Try it - you'll see!) The grammar, the behaviour, the high-level reasoning - they're all transparent and straightforward.

So where is the complexity to be found? It's in the implementation. That says two things to me: firstly, that it's very difficult, when faced with a complicated solution, to state for sure whether the complexity is inherent in the problem or injected by the implementation. And secondly, that to avoid injecting superfluous complexity, you have to select a tool which keeps simple things simple. Frankly, that;'s something which, in general, we're very bad at - we tend to use the same old tools over and over again.


While I am sympathetic with the goal here I am perhaps a little confused by the tone. Maintainability has *always* been the goal. Given two implementations of an algorithm, the simpler one will usually be more maintainable than the more complex one.

Likewise, simpler designs (simpler to understand and implement) will be more maintainable than more complex designs. This doesn't mean that complex code is always the wrong choice. Sometimes it is the *only* choice. But this shouldn't imply that we accept a complex solution if a simpler one is available.

OTOH, "simple" can be in the eye of the beholder. A well placed pattern can make many aspects of a system easier to understand *and extend* over the life of a product. However, it can also add complexity in a localized region of code. This is a prudent trade-off. Isolated complexity for systemic simplicity. If all you focus on is the complexity of managing a pattern's infrastructure then you miss out on the easier maintenance to be had in the future.


I've seen big switch statement with cases at the same level of abstraction primarily in code that processes input data. Such as the lexer example that slkpg mentions below. Similarly, other apps that read binary data and look for codes or values that identify the kind of record or data item for subsequent processing. Virtual machines can be and example of the latter.


"It is more helpful to focus on whether the code is testable rather than whether it is simple." I'm not sure I agree completely. I agree that it's important that code be testable. But I'm not sure why that would be more important. For example, in TDD, which is all about having 100% testable code, refactoring is a crucial step--precisely because the code becomes fairly hideous if the only concern is building in small testable increments. I think both testability and simplicity as I describe it are important. I'm just not comfortable putting one ahead of the other.


I use a 128 case switch in my lexical analyzers. One case for each ascii character. Very machine efficient and quite readable. So yes I have 128 choices for the first character of a lexeme at the same level of abstraction. The source code for one such scanner can be seen at under the link for source code for a C parser.


You wrote: "So, if your switch has 35 branches, you violate the threshold with no reasonable way to simplify your code." Well a lot of rules are context dependent, so i am sure there are some occassional situations in which a 35 branch switch statement is called for. But that aside, and ignoring machine generated code, it seems me that a 35 branch switch *could* perhaps be ripe for refactoring. Do you really have 35 choices at the same level of abstraction or do you really have mixed abstraction levels allowing you to condense it down to perhaps a 5 or 6 branch switch each of which can branch further. The human mind has a hard time grasping 35 choices and their consequences!


While often complex code cannot itself be simplified, as in the case of regular expression implementation, it most certainly can be documented in order to make it easily understood.

Explaining complex code in a simple, logical, step-by-step manner is more difficult than writing complex code itself. Plus, one must maintain that documentation when changes are made...

Maybe you can follow this article up with a piece about making complex code *look* simple with the help of good, clear English commenting & documentation.


It is more helpful to focus on whether code is testable than whether it is "simple." Testable means that the code is structured in such a way that you can construct easily repeatable cases that exercise product features at the lowest level of abstraction possible. (E.g., test as much as possible in unit tests rather than waiting for full product builds.)


I was recently taken to task, in a similar forum, by a guy who does write in binary, as a hobbyist. He and some friends have a loose club-like environment where they have programming projects that are accomplished in binary. Or so he says.


It seems to me that computer programming is, itself, an exercise in simplification. It simplifies human activity by making a machine solve a complex problem with minimal human involvement. This is desirable because complex thinkng is, in some fashion, painful. We don't like to do it. It doesn't seem to come naturally. Hence, the KISS principle. KISS doesn't necessarily result in accurate answers, but, it does result in easier answers.

There is a concept often quoted in arguments about the validity of Darwinian evolution. It is Irreducible Complexity (are the caps premature?). This is what KentD397 is referring to when he says, "A complex problem cannot always be simplified." Nature is rife with IC. Reality can be viewed from a simplistic viewpoint, but, reality is inherently complex, as are many programming problems.

I contend that spaghetti code is frequently less complex than well organized and easily read code and how simple it is depends on your definition of simple. If understandability is what you call simple, then, the more complex code is also the simpler code. If having fewer parts is what you call simple, then, sometimes the spaghetti code is the simpler code. Both definitions are legitimate, but, unfortunately, the two don't always go hand in hand. Methinks that there should be a third dimension added to the definition of simple - the chaotic dimension - since the definition is already a compound definition. The less chaotic, the simpler. As a matter of fact, the chaotic dimension might be the only metric we need as it would be a combination of the two existing elements of the definition. It would probably have to be completely subjective, as well, resulting in new things to argue about. Argh!


Using the simile of organizing a room, complexity is not measured in the total amount of stuff in the room, it is measured in how easy for human mind to comprehend all the stuff in the room. A regular expression accommodate full range of features are very complex -- just as the room contains many stuff. However, one should not use it as the excuse to stop the effort of producing an implementation that is easy to comprehend. The holy-grail of that effort is to introduce new paradigm of programming just as a room organizer may bring in new fur natures to help the purpose.


Layering or hiding complexity is simplifying. Just as organizing your room, without reducing any stuff, an well organized room is simpler for strangers (and yourself) to navigate. On the other hand, I recognize that is also easy to organize it in a non-intuitive way that increase the difficulty of the comprehension -- any idiot can make it complex, only expert make it simple.


``Any idiot can write complex code, the true art is writing simple code."

Let's say start with two straight thread and it is necessary to reach certain functionality for them to interact at some points. An idiot would cross them ad-hoc, which give rise to speghetty code that having code threads cross at many, seemingly random places. The result is a more complex in the sense that it is difficult to comprehend.

On the other hand, wise coders focusing on the logic comprehension side, would analyze those necessary crossing points, organize them into different groups and layers, resulting in code that only have a few (7+/-2) comprehension points at each code hierarchy -- a much simpler code, albeit achieves the same functionality.

In essence, simple or complex is referring to human comprehension. Give the same amount of total complexity (measured by certain math), code writing in structures that limit comprehension points according to the famous seven plus or minus two rules are perceived simpler and benefit both development and maintenance.

Then , recognizing the intrinsic complexity and refraining from unplanned/unevaluated feature expansion or future proof takes a lot of experience and wisdom. An idiot often sees more complexity than a wise coder, easily resulting in code that is more complex than necessary.


"I would contend that implementing regular expressions isn't inherently complex." Honestly, I cannot imagine why you would contend that. I know of no one who's implemented the full range of regular expressions who would agree with you. I know two folks who have done it. And we have Aho's account.