Channels ▼
RSS

Design

Component Programming in D


Putting It Together

Let's nail these bad boys together and solve a straightforward problem. The problem is to read lines from stdin, sort them, and write the sorted result to stdout. The complete program is:

import std.stdio;
import std.array;
import std.algorithm;

    void main() {
        stdin.byLine(KeepTerminator.yes)    // 1
        map!(a => a.idup).                  // 2
        array.                              // 3
        sort.                               // 4
        copy(                               // 5
            stdout.lockingTextWriter());    // 6
    }

Notes:

  1. This creates an InputRange that reads from stdin line by line, keeping the line terminator at the end of the line.
  2. In order to read lines fast, ByLine reuses the same character buffer. In order to create an array of lines, we'll need to copy the lines returned by ByLine. The map function (from std.algorithm) handles that nicely, calling a lambda that maps each input line to a copy of that line.
  3. The lines need to be coalesced into an array of lines, which is what std.array.array does. An array is a RandomAccessRange, which is needed by the sort algorithm.
  4. What could be simpler than this call to the sort algorithm?
  5. copy() is the algorithm we wrote about earlier. It copies its first argument, an InputRange, to its second argument, an OutputRange.
  6. The lockingTextWriter locks the output stream, which is stdout, writes its input (which is an InputRange of lines of text) to that stream.

This code does an excellent job of meeting out criteria for components. The flow of control follows a logical progression from top to bottom, with no loops or conditional statements. It's easy to reason about (and hence, much more likely to be correct). The individual components are independent of each other, and each is reusable, and replaceable with others.

Features Needed for Components

Quite a few features of D come together to make this work:

  • Exception handling for errors rather than error codes. This design removes all the error-handling code from the use of the components, while fully supporting error detection. Maybe there is a clean way to do this with error codes, but I haven't thought of one.
  • Generic functions. Since ranges are concepts, not types, the straightforward way to write algorithms is as function templates that can adapt themselves to the final range types.
  • Template constraints. In order to stave off the user-hostile problem of inscrutable error messages, template constraints prevent algorithms from instantiating unless the range arguments match the corresponding range concepts.
  • UFCS (Uniform Function Call Syntax). This enables a natural left-to-right order of composition and evaluation of ranges and algorithms. At the same time, it allows algorithms to be implemented outside any particular type or family of types, fostering better modularity.
  • Language recognition of InputRanges for foreach loops. While not strictly required, this makes for convenient and robust iteration over InputRanges.
  • Ranges are concepts, not types. Ranges can be implemented as value or reference types, whichever is more appropriate for the specific case. It is not necessary to use virtual functions or other specific interface types.
  • Inlining, customization, and optimization. The compiler is fully empowered by the language to inline, customize, and optimize a generic implementation into one customized by the particular types involved. This is necessary in order to match the performance of hand-built, custom code.
  • Specialization. Although not shown in the examples, the implementor of the components can add specialized versions based on inquiring about the argument types at compile time.
  • Type Deduction. D is a statically typed language, but the file sorting program doesn't explicitly mention any types. The compiler can deduce the required types from the arguments.
  • Tuples. The example that computes both the sums and squares in a single pass relies on tuples to produce its return value(s). Being able to combine multiple operations into a single pass over the data is an important optimization.

Conclusion

I've suffered the frustration of non-reusable code for a long time. One solution to the problem is component programming. A successful component design is a combination of convention with some careful language support.

Many advanced features of D come together to enable development of generic components that can snap together in a simple, robust manner. While writing the components is slightly more work than the traditional method of custom loops, the ease of use of those components, and their reusability, more than balance the ledger towards the profit side.

D's component design builds on the earlier successes of the UNIX "files and filters" model, followed by C's file stream interface, followed by C++'s iterators and algorithms. While this capability is relatively new in D, early indications are that it is a robust and extremely promising design. An awful lot of code can be refactored into ranges and algorithms, and hence, can be easily made into reusable software components.

Further Thoughts

Clearly, this discussion invites comparison to how this is or may be done in other languages. It's beyond the scope of this article to do a thorough survey of this, and even if I tried, it's easy to miss best practices in a language one is not thoroughly used to programming in. The best I can do is discuss the rationale for certain design choices for D, and to speculate how D would have fared for component-based programming for other choices.

For example, the classic functional-language approach to component programming is to use a list as the main connection between components. The list is an example of a D forward range. A forward range is a very bad fit for a variety of important algorithms, such as quicksort, binary search, median, and nth element. These need random access. High performance code also can mandate very specific data layouts to maximize memory throughput. The D range concept allows appropriate data structures to be used, rather than trying to make lists fit everything.

Other languages implement parameterized containers and algorithms (such as higher-order functions) by using indirection (usually virtual functions). This constrains the options for data layout (again, data layout can be performance critical), and presumes the existence of a "sufficiently smart optimizer" that can unwind the indirections to generate code specialized to the instantiation types. Without that, one winds up with inefficient components that invite programmers to not use them.

he D approach is to use so-called "heterogeneous" generics, where generating specialized code upon instantiation is relatively straightforward. Furthermore, this specialized code instantiation is composable — components can be layered on top of each other without concern about performance.

There is also the issue of manageability. Does a component design rely on incidental scaffolding, tricks, exploits, and cautions? Is there a lot of boilerplate code surrounding the meat? At the very least, the use of the components should be straightforward enough that a relative newcomer to the language can get it to work, and get it to generate quality results. I've had enough of code that looks simple and elegant in a tutorial, only to find out it doesn't perform well, and that "experts don't write code that way," and I'm sure you have, too.

Finally, I should note that there's an enormous amount of history behind component programming and a large body of work written about software component programming, as well as a smorgasbord of ideas about what components should be. These include many good ideas not discussed here.

Acknowledgements

My thanks to Andrei Alexandrescu, who came up with many of the ideas and examples here. Additional thanks to David Held, Bartosz Milewski, Eric Niebler, and Brad Roberts for their most helpful comments on improving this article.


Walter Bright is the original designer of the D language and a blogger for Dr. Dobb's.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video