Channels ▼
RSS

Design

The Misplaced Obsession with Simplicity


Like so many aspects of modern life, in programming good is defined in absolute terms and bad is defined as the failure to live up to the standards established by good. In that great storehouse of all that is good in programming, perhaps no quality receives more praise than simplicity. Reading quotations from famous programmers about the pursuit of simplicity is like reading about the quest for the Grail. We're told that simplicity is always hard work. That inside every complex bit of code, there's a simpler version waiting to be set free. That simplicity is the hallmark of great minds. And so on.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

Predictably, complex code is regularly savaged: It's the telling symptom of laziness, lack of skill, or lack of discipline. It's a code smell. Any idiot can write complex code, the true art is writing simple code.

I need not continue. Everyone has heard, thought, and even embraced these platitudes to some extent. But in fact, I believe, they are mostly wrong. And what bothers me is that they invariably go unchallenged. Listeners nod in assent and move on to the next thing.

Let's tease this out a little. It's not true that any idiot can write complex code. Complex code is difficult, often very difficult, to write. It's entirely true that it's more difficult to maintain, too. But that's the nature of complexity. Some things are intensely difficult to express in code and they require complexity, simply because they're not inherently simple.

Perhaps you've been led to believe that complexity is the product of insufficient skill. A real-life example should lay that theory to rest. For grins, let's take one of the best-known programmers in  parser theory (Al Aho, coauthor of the Dragon book), team him with Brian Kernighan (the K in K&R), and have him write a DSL that becomes the very model of programming elegance (awk). If skill were all that were needed to write simple code, then you'd expect Aho to deliver the simplest parser code imaginable — clean and elegant as K&R's examples of C. Now, the reality, as written by Aho himself: "I had incorporated some sophisticated regular expression pattern-matching technology into AWK Brian Kernighan once took a look at the pattern-matching module that I had written and his only addition was putting a comment, 'Abandon all hope, ye who enter here.' As a consequence, neither Kernighan nor Weinberger would touch that part of the code. I was the one who always had to make the bug fixes to that module" (from Masterminds of Programming, p. 103). Complex problems require complex code.

There's a big difference between poorly written code and complexity. Unfortunately, parts of the Agile movement have tended to obfuscate the distinction. A rule of thumb I've seen cited several times is that functions with a cyclomatic complexity number (CCN) of more than 30 or 35 must be rewritten. This is patent nonsense and implies that all complex code is equivalent to badly written code. Moreover, there's a peripheral problem with the assertion; namely, that every branch of a switch statement adds 1 to the CCN. So, if your switch has 35 branches, you violate the threshold with no reasonable way to simplify your code. (Sure, you could use some kind of table instead of the switch, but now you've taken logic that was easy to read and made it considerably more difficult to follow.)

This brings me to the point that is truly the heart of the matter: It's not simplicity that matters, but readability. Can the code be understood with the least amount of effort given its inherent complexity? That is the true criterion. How readable, or if you prefer, how maintainable is the code with respect to its complexity?

As to the obsessive reverence for simplicity, it is to me quixotic and sentimental. If we look at nature, we find that absolutely nothing in it is simple. Everything in nature is characterized by extraordinary complexity that masks further layers of complexity. As to human creations such as the automobile, they induce people, like the mechanic at my garage, to remember with fondness days past when everything was "simpler." But in fact, in the last 15 years, the rates of unexpected mechanical failures in cars has plummeted and gas mileage has soared. One of the biggest reasons for these advances is the 5 to 10 million lines of code in cars today. Yes, indeed, vastly complex code has delivered significant benefits — and made the mechanic's repair work considerably simpler.

My view of simplicity is unemotional and free of idolatry because I define it with respect to complexity, rather than the other way around: Simplicity is the quality of code that is no more complex than required to express the underlying complexity. In this way, simple code can be intensely complex. There is no inherent good/bad dichotomy. 

— Andrew Binstock
Editor in Chief
alb@drdobbs.com
Twitter: platypusguy


Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-ad9041a796fec90072c4430be7b516f3
2013-08-06T17:37:57

I think you fail to make sufficient distinction between inherent and accidental complexity. Of course any given problem has some level of inherent complexity and you can't ignore that.

However it has been my experience that most software is over-complex for no good reason. It is rarely the inherent complexity of the problem that is at the root of the problem it is generally poor abstraction and poor separation of concerns.

So while I agree with you that any rule that says "You must rewrite any method with a cyclomatic complexity over 35" is suspect if treated as an absolute, you said yourself that it is a "rule of thumb" which means it is a rough guide not a law. That seems ok to me. Any method with a high cyclomatic complexity should be questioned, with the understanding that an acceptable answer is "In this case the method is ok becuase it is solving an inherently complex problem".


Permalink
ubm_techweb_disqus_sso_-4124b309ed8223fe99361b41fa82ae3d
2013-07-17T17:20:19

It is interesting how easy it is to fall prey to proven paths to success, quality, and what not. At the end of the day, money needs to be made and benefits need to outweigh the costs. Maintenance, simplicity, blah, blah, blah, are vitally important, but can also brew waste and failure in some contexts. This brings to mind several thousand lines of awk script at the root of a successful project. At the time, the software team was prone to being beaten up by being consistently late or slow, and the hardware guys somehow always came out smelling like roses (probably not unrelated to the fact that they could burn schedule on the front end and leave the dregs for software, but that's not really the point). Post-script, multiple variants of the product were produced such that software development was never the long pole in the tent (or at the mercy of the hardware guys burning up schedule). The script took a spreadsheet (or tabular specification of operational aspects of the product monitor and control system) and converted it to: source code for embedded processors, input to a GUI engine, and documentation that described use of the product. Most software changes involved working the spreadsheet, running the scripts, and doing a bit of code work if new core functionality was required, but often the new or changed functionality could be implemented as a spreadsheet data change. The awful complexity of the scripts and engines resulted in profit and success. Somewhere in the code is a comment "This is ancient lore. As such, it is not necessarily beautiful. Sorry. Be warned. ... ". It was placed there after numerous, sincere, failed, attempts to simplify and improve the implementation. When speaking on the topic, the author adopts a Mr. T tone of voice and begins with "I pity the foo..." and ends with something like "change/understand this code". The author is not proud of the script, but he could be justifiably proud of the achievements that resulted from it: an astonishingly tight synchronization between an embedded product, client application, and documentation that left the software guys smelling like roses. In hindsight, with knowledge and experience gained of the years, the author realized many ways the implementation lacked, and would not care to see the code subjected to analysis (as the results would be loud and resounding boos), but in the end... is not ashamed to have been a part of producing years of revenue by facilitating and software re-use to be ultimate benefit of the company and his peers in years following. Simplicity and readability was a goal, but when push came to shove, progress was required and money had to be made despite merciless schedule. It's not an excuse per se. The best work often requires Solomon's wisdom when it comes to pursuing the various Holy Grails of coding. Ugly, unmaintainable, complex, etc. aren't ideal... but may be good enough... or even a (gasp) necessary component of success and experience gain. There might indeed be a place for obsession, or compulsive preoccupation with ideals, but they can hardly be truthfully described universally mandatory and profitable. It could have gone badly, and maybe the risk wasn't justified - but then again, the worst never happened to that script, born over a decade ago, with a product that still manages to generate income now and then. I was the author, paradoxically apologetic and unapologetic at the same time.


Permalink
ubm_techweb_disqus_sso_-cb0446ec88860ae7f14698583ff4fc4c
2013-07-16T23:00:19

"There is no inherent good/bad dichotomy."

Only if the complexity is derived from the problem domain. Otherwise, it's just an unnecessarily confusing mess.


Permalink
ubm_techweb_disqus_sso_-cb0446ec88860ae7f14698583ff4fc4c
2013-07-16T22:54:23

In practice, "reading complexity" is rarely even approximately equal, and readability is only weakly correlated with length. What makes unjustifiably complex code hard to read are all those WTF things that the author added to make it unjustifiably complex.


Permalink
ubm_techweb_disqus_sso_-656f64b26b78cb3d3703f58f7f7de7b6
2013-07-03T23:32:36

I just spent 3 days cleaning up a program written by a colleague before I could start extending it. It was typical of a program developed by a self-taught C programmer: long winded, unclear, separated into functions along no clear boundaries, and implemented with simple data structures that did not at all reflect the actors in the story the code was attempting to tell. Most of the challenge was to see the data in all that code, create a few appropriate classes, and then beat the functions into class members. After dropping all the code he'd written that was already provided by libraries, this program is 1/3 it's original size in terms of source lines, about 20% it's original size in code size, and lost at least 4 obvious errors along the way.

On the gripping tentacle, I'll compare that mess to an embedded system I worked on 15 years ago. It had one central function, written as a single C function because that made it easier to lock the code into the small instruction cache on the CPU, that was 26,000 lines long. This function, which had a McCabe's number in the 330 range, was the best-crafted piece of code in the entire system, because it absolutely had to be. There were only 3 or 4 programmers in the company allowed to touch it and I was quite flabbergasted to be included on the list, although I was included only with Jeff watching very carefully over my shoulder.

Yes, the code complexity needs to match the complexity of the problem. Still, unnecessary complexity is to be avoided at all costs. Learning to see the simple program inside the spaghetti is not a sacred cow to be slaughtered lightly.


Permalink
AndrewBinstock
2013-07-03T19:22:00

I understand your concern, but bad programmers will always be able to find something to justify their work, if they search long enough. Aho's code was indeed that grotty. In correspondence with the maintainer of gawk (GNU awk), to whom I sent the link to this editorial, he replied that the corresponding code in gawk is so complex, that's it's the one thing he just doesn't touch. Any fixes or patches are handled by the original developers of that code. Despite one of the contentions earlier in this comment stream, regex parsers are complex beasts.


Permalink
disqus_tG5FeaXVqM
2013-07-03T18:14:09

Articles like this bug me, because they're the sort of article that bad programmers latch onto to justify why they continue to write incomprehensible crap.

Was Aho's regex code really so irreducibly complex as to justify writing code that incredibly smart guys like Kernighan and Weinberg would refuse to touch? Honestly, I don't think that there really are that many problems that are so irreducibly complex. Complex problems are a sum of simpler problems.


Permalink
ubm_techweb_disqus_sso_-fa2dd25d06356759c28e58ecc9d79195
2013-07-02T14:27:24

You spoke my heart!!! It is the truth that no one understands and makes you stand out as the dumb guy in the group.


Permalink
ubm_techweb_disqus_sso_-a00f6807482984968a93fcd627f31c54
2013-07-02T14:07:57

Personally, I think that there's something more important than code complexity or simplicity. That's architecture, in the sense of mental models of what's trying to be computed. I spent days on a previous project, a debugger, trying to figure out how to get started on the > 2M SLOC code, when someone told me about a 10 page set of slides which contained the architecture from 5 years previous. With those slides I was able to get a sense of how the project was put together, and was able to make contributions almost immediately. Architecture, and especially architectural documentation, trumps simple code.


Permalink
ubm_techweb_disqus_sso_-16d7c50d6c078abc2acbddb01ccf46d4
2013-07-02T13:44:18

I think Mr. Binstock's point is that there are shops out there that give too much concern and effort towards a flawed metric. Think back to the structured programming days where "a procedure cannot exceed 50 lines (or 24 lines or ???)", and people wrote utilities that went through their source code and every 50 lines would replace it with a procedure called "DoStuff1", then the next 50 with "DoStuff2", then the next 50 with "DoStuff3", then DoStuff() just had calls to those 3 procedures. Same issue, only a different (and similarly flawed) metric.
Metrics, although quantitative, should be analyzed a little more qualitatively. "Yeah, but my non-programming manager can't tell the difference when it's "good" to violate that metric, but they can defiinitely know when this number is > 50!". I didn't say it was going to be easy. :-D


Permalink
ubm_techweb_disqus_sso_-16d7c50d6c078abc2acbddb01ccf46d4
2013-07-02T13:34:28

If these 35 (128, 1000) branches are "peers" (e.g. lexical tokens), then I would expect them to be in the same level, same switch (but I would also want some "order" to their layout within the switch).
However, if there is a hierarchy (e.g. nested state machines) all at the same level, then yes, it should be refactored at the proper "layer" (e.g. the various state machines are encapsulated within their own class/methods or functions).
Actually "layers" (think OSI comm model) should be applied in your design and your code. Just this one "abstraction" can simplify and make your design and code more understandable (again, applying the 7 +/- 2 rule).


Permalink
ubm_techweb_disqus_sso_-e1cb6d27ef8310898b3067f37a5a489f
2013-07-02T13:18:28

I contend as I do because the basic task that regular expressions do is pretty straightforward (even if the grammar looks frightening). Any fool can mechanically translate a RE into some more comprehensible form (say, railroad diagrams), and any other fool could animate the diagram using test cases. It doesn't take much more intelligence to use the diagram to predict behaviour at a more abstract level. (Try it - you'll see!) The grammar, the behaviour, the high-level reasoning - they're all transparent and straightforward.

So where is the complexity to be found? It's in the implementation. That says two things to me: firstly, that it's very difficult, when faced with a complicated solution, to state for sure whether the complexity is inherent in the problem or injected by the implementation. And secondly, that to avoid injecting superfluous complexity, you have to select a tool which keeps simple things simple. Frankly, that;'s something which, in general, we're very bad at - we tend to use the same old tools over and over again.


Permalink
ubm_techweb_disqus_sso_-204827c4738b3376e28dffb7524ea204
2013-07-02T01:21:30

While I am sympathetic with the goal here I am perhaps a little confused by the tone. Maintainability has *always* been the goal. Given two implementations of an algorithm, the simpler one will usually be more maintainable than the more complex one.

Likewise, simpler designs (simpler to understand and implement) will be more maintainable than more complex designs. This doesn't mean that complex code is always the wrong choice. Sometimes it is the *only* choice. But this shouldn't imply that we accept a complex solution if a simpler one is available.

OTOH, "simple" can be in the eye of the beholder. A well placed pattern can make many aspects of a system easier to understand *and extend* over the life of a product. However, it can also add complexity in a localized region of code. This is a prudent trade-off. Isolated complexity for systemic simplicity. If all you focus on is the complexity of managing a pattern's infrastructure then you miss out on the easier maintenance to be had in the future.


Permalink
AndrewBinstock
2013-07-01T05:49:40

I've seen big switch statement with cases at the same level of abstraction primarily in code that processes input data. Such as the lexer example that slkpg mentions below. Similarly, other apps that read binary data and look for codes or values that identify the kind of record or data item for subsequent processing. Virtual machines can be and example of the latter.


Permalink
AndrewBinstock
2013-07-01T05:44:04

"It is more helpful to focus on whether the code is testable rather than whether it is simple." I'm not sure I agree completely. I agree that it's important that code be testable. But I'm not sure why that would be more important. For example, in TDD, which is all about having 100% testable code, refactoring is a crucial step--precisely because the code becomes fairly hideous if the only concern is building in small testable increments. I think both testability and simplicity as I describe it are important. I'm just not comfortable putting one ahead of the other.


Permalink
ubm_techweb_disqus_sso_-d61ca45a8c7fed1a423c3901aeec682d
2013-07-01T01:12:14

I use a 128 case switch in my lexical analyzers. One case for each ascii character. Very machine efficient and quite readable. So yes I have 128 choices for the first character of a lexeme at the same level of abstraction. The source code for one such scanner can be seen at http://slkpg.byethost7.com under the link for source code for a C parser.


Permalink
ubm_techweb_disqus_sso_-43e165b47cf5d0edfff7378e69ef5220
2013-06-30T21:14:42

You wrote: "So, if your switch has 35 branches, you violate the threshold with no reasonable way to simplify your code." Well a lot of rules are context dependent, so i am sure there are some occassional situations in which a 35 branch switch statement is called for. But that aside, and ignoring machine generated code, it seems me that a 35 branch switch *could* perhaps be ripe for refactoring. Do you really have 35 choices at the same level of abstraction or do you really have mixed abstraction levels allowing you to condense it down to perhaps a 5 or 6 branch switch each of which can branch further. The human mind has a hard time grasping 35 choices and their consequences!


Permalink
ubm_techweb_disqus_sso_-3a84e20e13600ce492791867bc41ac86
2013-06-29T02:24:03

While often complex code cannot itself be simplified, as in the case of regular expression implementation, it most certainly can be documented in order to make it easily understood.

Explaining complex code in a simple, logical, step-by-step manner is more difficult than writing complex code itself. Plus, one must maintain that documentation when changes are made...

Maybe you can follow this article up with a piece about making complex code *look* simple with the help of good, clear English commenting & documentation.


Permalink
ubm_techweb_disqus_sso_-ed3813e38abec76dcb53e00e72c16320
2013-06-29T01:40:15

It is more helpful to focus on whether code is testable than whether it is "simple." Testable means that the code is structured in such a way that you can construct easily repeatable cases that exercise product features at the lowest level of abstraction possible. (E.g., test as much as possible in unit tests rather than waiting for full product builds.)


Permalink
ubm_techweb_disqus_sso_-0e5e900e6dbc9909cff068a7849c2b7d
2013-06-28T19:40:16

I was recently taken to task, in a similar forum, by a guy who does write in binary, as a hobbyist. He and some friends have a loose club-like environment where they have programming projects that are accomplished in binary. Or so he says.


Permalink
ubm_techweb_disqus_sso_-0e5e900e6dbc9909cff068a7849c2b7d
2013-06-28T19:27:04

It seems to me that computer programming is, itself, an exercise in simplification. It simplifies human activity by making a machine solve a complex problem with minimal human involvement. This is desirable because complex thinkng is, in some fashion, painful. We don't like to do it. It doesn't seem to come naturally. Hence, the KISS principle. KISS doesn't necessarily result in accurate answers, but, it does result in easier answers.

There is a concept often quoted in arguments about the validity of Darwinian evolution. It is Irreducible Complexity (are the caps premature?). This is what KentD397 is referring to when he says, "A complex problem cannot always be simplified." Nature is rife with IC. Reality can be viewed from a simplistic viewpoint, but, reality is inherently complex, as are many programming problems.

I contend that spaghetti code is frequently less complex than well organized and easily read code and how simple it is depends on your definition of simple. If understandability is what you call simple, then, the more complex code is also the simpler code. If having fewer parts is what you call simple, then, sometimes the spaghetti code is the simpler code. Both definitions are legitimate, but, unfortunately, the two don't always go hand in hand. Methinks that there should be a third dimension added to the definition of simple - the chaotic dimension - since the definition is already a compound definition. The less chaotic, the simpler. As a matter of fact, the chaotic dimension might be the only metric we need as it would be a combination of the two existing elements of the definition. It would probably have to be completely subjective, as well, resulting in new things to argue about. Argh!


Permalink
ubm_techweb_disqus_sso_-adb6f52cbabc43846144a199b236fbbb
2013-06-28T15:41:29

Using the simile of organizing a room, complexity is not measured in the total amount of stuff in the room, it is measured in how easy for human mind to comprehend all the stuff in the room. A regular expression accommodate full range of features are very complex -- just as the room contains many stuff. However, one should not use it as the excuse to stop the effort of producing an implementation that is easy to comprehend. The holy-grail of that effort is to introduce new paradigm of programming just as a room organizer may bring in new fur natures to help the purpose.


Permalink
ubm_techweb_disqus_sso_-adb6f52cbabc43846144a199b236fbbb
2013-06-28T15:34:59

Layering or hiding complexity is simplifying. Just as organizing your room, without reducing any stuff, an well organized room is simpler for strangers (and yourself) to navigate. On the other hand, I recognize that is also easy to organize it in a non-intuitive way that increase the difficulty of the comprehension -- any idiot can make it complex, only expert make it simple.


Permalink
ubm_techweb_disqus_sso_-adb6f52cbabc43846144a199b236fbbb
2013-06-28T15:25:26

``Any idiot can write complex code, the true art is writing simple code."

Let's say start with two straight thread and it is necessary to reach certain functionality for them to interact at some points. An idiot would cross them ad-hoc, which give rise to speghetty code that having code threads cross at many, seemingly random places. The result is a more complex in the sense that it is difficult to comprehend.

On the other hand, wise coders focusing on the logic comprehension side, would analyze those necessary crossing points, organize them into different groups and layers, resulting in code that only have a few (7+/-2) comprehension points at each code hierarchy -- a much simpler code, albeit achieves the same functionality.

In essence, simple or complex is referring to human comprehension. Give the same amount of total complexity (measured by certain math), code writing in structures that limit comprehension points according to the famous seven plus or minus two rules are perceived simpler and benefit both development and maintenance.

Then , recognizing the intrinsic complexity and refraining from unplanned/unevaluated feature expansion or future proof takes a lot of experience and wisdom. An idiot often sees more complexity than a wise coder, easily resulting in code that is more complex than necessary.


Permalink
AndrewBinstock
2013-06-27T19:02:26

"I would contend that implementing regular expressions isn't inherently complex." Honestly, I cannot imagine why you would contend that. I know of no one who's implemented the full range of regular expressions who would agree with you. I know two folks who have done it. And we have Aho's account.


Permalink

Video