Channels ▼
RSS

Design

Component Programming in D


Putting It Together

Let's nail these bad boys together and solve a straightforward problem. The problem is to read lines from stdin, sort them, and write the sorted result to stdout. The complete program is:

import std.stdio;
import std.array;
import std.algorithm;

    void main() {
        stdin.byLine(KeepTerminator.yes)    // 1
        map!(a => a.idup).                  // 2
        array.                              // 3
        sort.                               // 4
        copy(                               // 5
            stdout.lockingTextWriter());    // 6
    }

Notes:

  1. This creates an InputRange that reads from stdin line by line, keeping the line terminator at the end of the line.
  2. In order to read lines fast, ByLine reuses the same character buffer. In order to create an array of lines, we'll need to copy the lines returned by ByLine. The map function (from std.algorithm) handles that nicely, calling a lambda that maps each input line to a copy of that line.
  3. The lines need to be coalesced into an array of lines, which is what std.array.array does. An array is a RandomAccessRange, which is needed by the sort algorithm.
  4. What could be simpler than this call to the sort algorithm?
  5. copy() is the algorithm we wrote about earlier. It copies its first argument, an InputRange, to its second argument, an OutputRange.
  6. The lockingTextWriter locks the output stream, which is stdout, writes its input (which is an InputRange of lines of text) to that stream.

This code does an excellent job of meeting out criteria for components. The flow of control follows a logical progression from top to bottom, with no loops or conditional statements. It's easy to reason about (and hence, much more likely to be correct). The individual components are independent of each other, and each is reusable, and replaceable with others.

Features Needed for Components

Quite a few features of D come together to make this work:

  • Exception handling for errors rather than error codes. This design removes all the error-handling code from the use of the components, while fully supporting error detection. Maybe there is a clean way to do this with error codes, but I haven't thought of one.
  • Generic functions. Since ranges are concepts, not types, the straightforward way to write algorithms is as function templates that can adapt themselves to the final range types.
  • Template constraints. In order to stave off the user-hostile problem of inscrutable error messages, template constraints prevent algorithms from instantiating unless the range arguments match the corresponding range concepts.
  • UFCS (Uniform Function Call Syntax). This enables a natural left-to-right order of composition and evaluation of ranges and algorithms. At the same time, it allows algorithms to be implemented outside any particular type or family of types, fostering better modularity.
  • Language recognition of InputRanges for foreach loops. While not strictly required, this makes for convenient and robust iteration over InputRanges.
  • Ranges are concepts, not types. Ranges can be implemented as value or reference types, whichever is more appropriate for the specific case. It is not necessary to use virtual functions or other specific interface types.
  • Inlining, customization, and optimization. The compiler is fully empowered by the language to inline, customize, and optimize a generic implementation into one customized by the particular types involved. This is necessary in order to match the performance of hand-built, custom code.
  • Specialization. Although not shown in the examples, the implementor of the components can add specialized versions based on inquiring about the argument types at compile time.
  • Type Deduction. D is a statically typed language, but the file sorting program doesn't explicitly mention any types. The compiler can deduce the required types from the arguments.
  • Tuples. The example that computes both the sums and squares in a single pass relies on tuples to produce its return value(s). Being able to combine multiple operations into a single pass over the data is an important optimization.

Conclusion

I've suffered the frustration of non-reusable code for a long time. One solution to the problem is component programming. A successful component design is a combination of convention with some careful language support.

Many advanced features of D come together to enable development of generic components that can snap together in a simple, robust manner. While writing the components is slightly more work than the traditional method of custom loops, the ease of use of those components, and their reusability, more than balance the ledger towards the profit side.

D's component design builds on the earlier successes of the UNIX "files and filters" model, followed by C's file stream interface, followed by C++'s iterators and algorithms. While this capability is relatively new in D, early indications are that it is a robust and extremely promising design. An awful lot of code can be refactored into ranges and algorithms, and hence, can be easily made into reusable software components.

Further Thoughts

Clearly, this discussion invites comparison to how this is or may be done in other languages. It's beyond the scope of this article to do a thorough survey of this, and even if I tried, it's easy to miss best practices in a language one is not thoroughly used to programming in. The best I can do is discuss the rationale for certain design choices for D, and to speculate how D would have fared for component-based programming for other choices.

For example, the classic functional-language approach to component programming is to use a list as the main connection between components. The list is an example of a D forward range. A forward range is a very bad fit for a variety of important algorithms, such as quicksort, binary search, median, and nth element. These need random access. High performance code also can mandate very specific data layouts to maximize memory throughput. The D range concept allows appropriate data structures to be used, rather than trying to make lists fit everything.

Other languages implement parameterized containers and algorithms (such as higher-order functions) by using indirection (usually virtual functions). This constrains the options for data layout (again, data layout can be performance critical), and presumes the existence of a "sufficiently smart optimizer" that can unwind the indirections to generate code specialized to the instantiation types. Without that, one winds up with inefficient components that invite programmers to not use them.

he D approach is to use so-called "heterogeneous" generics, where generating specialized code upon instantiation is relatively straightforward. Furthermore, this specialized code instantiation is composable — components can be layered on top of each other without concern about performance.

There is also the issue of manageability. Does a component design rely on incidental scaffolding, tricks, exploits, and cautions? Is there a lot of boilerplate code surrounding the meat? At the very least, the use of the components should be straightforward enough that a relative newcomer to the language can get it to work, and get it to generate quality results. I've had enough of code that looks simple and elegant in a tutorial, only to find out it doesn't perform well, and that "experts don't write code that way," and I'm sure you have, too.

Finally, I should note that there's an enormous amount of history behind component programming and a large body of work written about software component programming, as well as a smorgasbord of ideas about what components should be. These include many good ideas not discussed here.

Acknowledgements

My thanks to Andrei Alexandrescu, who came up with many of the ideas and examples here. Additional thanks to David Held, Bartosz Milewski, Eric Niebler, and Brad Roberts for their most helpful comments on improving this article.


Walter Bright is the original designer of the D language and a blogger for Dr. Dobb's.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-c009665390508eb54a51a755eefc4bed
2014-07-17T10:21:44

A quote from Uncle Bob: "The granularity of reuse is the granularity of release" - i.e. the content of the code is not the only important factor. Code will not be reused unless developers can easily download and install it as a stand-alone library (or 'package' in Python parlance.) Tearing classes out of an old project, even if they are already split out into their own files, is a pain, and often uncovers unexpected dependencies that make the task harder than expected. The 'tearing out' needs to be done by the original author, and to prove they have done this they need to publish the code standalone, then others can practically reuse it.


Permalink
ubm_techweb_disqus_sso_-43e165b47cf5d0edfff7378e69ef5220
2012-12-01T05:01:39

"For example, the classic functional-language approach to component programming is to use a list as the main connection between components. The list is an example of a D forward range. "
This is only partially true. The easy solution is to use a list, but you could in Haskell at least, think ahead a bit and use a type-class. The type class then informally embodies the properties that the algorithm requires, via the functions of the type class. An improvement over Haskell would be languages that enable different algebras to be used which are then refined to the appropriate data structures.


Permalink
ubm_techweb_disqus_sso_-43e165b47cf5d0edfff7378e69ef5220
2012-12-01T04:55:14

"A forward range is a very bad fit for a variety of important algorithms, such as quicksort, binary search, median, and nth element. These need random access."
I'm sorry if I'm missing something obvious here but why does quicksort need random access?


Permalink
ubm_techweb_disqus_sso_-214b3dd075dff25745be8061031e5bfa
2012-11-13T19:06:46

I find the really odd in the struct StdinByChar that the property empty() is actually the one moving forward in the range. I would have expected the popFront to do that.


Permalink
ubm_techweb_disqus_sso_-fdff6b39dbc660ff00491589493b9525
2012-10-25T10:10:53

Are you trying to say that D does not support lazy evaluation? If you do, please check this article: http://dlang.org/lazy-evaluati... . If you do not, I apologise it is a misunderstanding then.


Permalink
nimrody
2012-10-23T19:05:40

Thanks for an interesting article.

This is similar to the sequence abstraction in clojure or streams in Haskell (or even generators in Python). Only difference is that both languages support lazy evaluation and therefore better composition than D.

Unless you support coroutines or lazy streams, you end up needing temporary buffers between "filters" (the algorithms you set up to reuse and compose).

Since D, much like C++, is targeted at performance sensitive applications, users will opt to rewrite instead of reuse.

And really, I think the root of the problem is users looking for optimized code and rewriting code so that it better fits the task. Dynamic languages enjoy better reuse since their users are willing to tolerate performance loss for easier programming.

I do agree that exceptions help in composing algorithms. Interesting observation given that Google's Go decided to go with functions returning error codes.


Permalink
ubm_techweb_disqus_sso_-5c471a50164a7b307706e8eba25b8488
2012-10-22T16:10:34

I was not promoting the other idea of a "component", merely pointing out the difference in definition. I would call the "component" as described here, as a source code class library. A component in other circumstances is often a binary that no one is particularly interested in learning how to dissect, they just want to get a job done quickly. COM is rarely used for commercial components these days.


Permalink
ubm_techweb_disqus_sso_-88fa23d875ddacf6c1b549a1b39e30c2
2012-10-18T03:09:42

The std.parallelism module is an equivalent of Intel's TBB and is made to work with D's concept of ranges.

For example, to iterate over a range using all available cores, all you need to do is:

foreach(myValue; parallel(myRange)) { ... }


Permalink
ubm_techweb_disqus_sso_-88fa23d875ddacf6c1b549a1b39e30c2
2012-10-18T02:55:25

But if it's Windows-only then it isn't properly reusable. If the code belongs to a class then it won't be the first-order function needed for composition. You can't derive from COM objects and composition is almost always a better design choice than inheritance.

D solves all of these problems brilliantly.


Permalink
ubm_techweb_disqus_sso_-69cccb9a3c6f8fb6d506ea96a0358917
2012-10-15T03:51:05

If the components were done as a D interface, then they would be virtual function calls, which can be relatively expensive. Doing them as a concept instead allows them to be direct calls, which are then inlinable and optimizable, resulting in significantly more efficient code.


Permalink
ubm_techweb_disqus_sso_-3bfeaa10f3fe3711742fc6956512ca69
2012-10-12T21:32:08

A nice article. I'm wondering though, why does Phobos not specify the various ranges as a set of interfaces rather than providing the isXXX functions?


Permalink
ubm_techweb_disqus_sso_-d48a3cf4aa43c6be37df43796094d876
2012-10-08T10:15:39

In the article (which I found very interesting!), it seems everything is performed by one thread, while it might make sense to split the steps in the pipeline on different threads for sake of multiprocessor performance. Here I guess TBB and FastFlow are two libraries that facilitate the same kind of declarative style functional programming that is proposed here.


Permalink
ubm_techweb_disqus_sso_-d48a3cf4aa43c6be37df43796094d876
2012-10-08T10:09:25

Similar functionality in C++ is achieved using Boost:Range if I'm not incorrect. With C++ lambdas this seems to holds great promise and should really be part of the standard (in a somewhat more refined form).


Permalink
ubm_techweb_disqus_sso_-5c471a50164a7b307706e8eba25b8488
2012-10-03T18:14:14

Your definition of "component" is interesting. On window systems, there are huge numbers of components used (com, .net controls), and the definition of "component" usually refers to binary resusability: reusing object code. In addition, there is often tool support, in the form of Visual Studio integration, (the "toolbox" and property grid), so that components can be instantiated and configured without writing code. Also usually excellent documentation, for commercial components.


Permalink

Video