Putting It Together
Let's nail these bad boys together and solve a straightforward problem. The problem is to read lines from stdin
, sort them, and write the sorted result to stdout
. The complete program is:
import std.stdio; import std.array; import std.algorithm; void main() { stdin.byLine(KeepTerminator.yes) // 1 map!(a => a.idup). // 2 array. // 3 sort. // 4 copy( // 5 stdout.lockingTextWriter()); // 6 }
Notes:
- This creates an
InputRange
that reads fromstdin
line by line, keeping the line terminator at the end of the line. - In order to read lines fast,
ByLine
reuses the same character buffer. In order to create an array of lines, we'll need to copy the lines returned byByLine
. Themap
function (fromstd.algorithm
) handles that nicely, calling a lambda that maps each input line to a copy of that line. - The lines need to be coalesced into an array of lines, which is what
std.array.array
does. An array is aRandomAccessRange
, which is needed by the sort algorithm. - What could be simpler than this call to the sort algorithm?
copy()
is the algorithm we wrote about earlier. It copies its first argument, anInputRange
, to its second argument, anOutputRange
.- The
lockingTextWriter
locks the output stream, which isstdout
, writes its input (which is anInputRange
of lines of text) to that stream.
This code does an excellent job of meeting out criteria for components. The flow of control follows a logical progression from top to bottom, with no loops or conditional statements. It's easy to reason about (and hence, much more likely to be correct). The individual components are independent of each other, and each is reusable, and replaceable with others.
Features Needed for Components
Quite a few features of D come together to make this work:
- Exception handling for errors rather than error codes. This design removes all the error-handling code from the use of the components, while fully supporting error detection. Maybe there is a clean way to do this with error codes, but I haven't thought of one.
- Generic functions. Since ranges are concepts, not types, the straightforward way to write algorithms is as function templates that can adapt themselves to the final range types.
- Template constraints. In order to stave off the user-hostile problem of inscrutable error messages, template constraints prevent algorithms from instantiating unless the range arguments match the corresponding range concepts.
- UFCS (Uniform Function Call Syntax). This enables a natural left-to-right order of composition and evaluation of ranges and algorithms. At the same time, it allows algorithms to be implemented outside any particular type or family of types, fostering better modularity.
- Language recognition of
InputRanges
forforeach
loops. While not strictly required, this makes for convenient and robust iteration overInputRanges
. - Ranges are concepts, not types. Ranges can be implemented as value or reference types, whichever is more appropriate for the specific case. It is not necessary to use virtual functions or other specific interface types.
- Inlining, customization, and optimization. The compiler is fully empowered by the language to inline, customize, and optimize a generic implementation into one customized by the particular types involved. This is necessary in order to match the performance of hand-built, custom code.
- Specialization. Although not shown in the examples, the implementor of the components can add specialized versions based on inquiring about the argument types at compile time.
- Type Deduction. D is a statically typed language, but the file sorting program doesn't explicitly mention any types. The compiler can deduce the required types from the arguments.
- Tuples. The example that computes both the sums and squares in a single pass relies on tuples to produce its return value(s). Being able to combine multiple operations into a single pass over the data is an important optimization.
Conclusion
I've suffered the frustration of non-reusable code for a long time. One solution to the problem is component programming. A successful component design is a combination of convention with some careful language support.
Many advanced features of D come together to enable development of generic components that can snap together in a simple, robust manner. While writing the components is slightly more work than the traditional method of custom loops, the ease of use of those components, and their reusability, more than balance the ledger towards the profit side.
D's component design builds on the earlier successes of the UNIX "files and filters" model, followed by C's file stream interface, followed by C++'s iterators and algorithms. While this capability is relatively new in D, early indications are that it is a robust and extremely promising design. An awful lot of code can be refactored into ranges and algorithms, and hence, can be easily made into reusable software components.
Further Thoughts
Clearly, this discussion invites comparison to how this is or may be done in other languages. It's beyond the scope of this article to do a thorough survey of this, and even if I tried, it's easy to miss best practices in a language one is not thoroughly used to programming in. The best I can do is discuss the rationale for certain design choices for D, and to speculate how D would have fared for component-based programming for other choices.
For example, the classic functional-language approach to component programming is to use a list as the main connection between components. The list is an example of a D forward range. A forward range is a very bad fit for a variety of important algorithms, such as quicksort, binary search, median, and nth element. These need random access. High performance code also can mandate very specific data layouts to maximize memory throughput. The D range concept allows appropriate data structures to be used, rather than trying to make lists fit everything.
Other languages implement parameterized containers and algorithms (such as higher-order functions) by using indirection (usually virtual functions). This constrains the options for data layout (again, data layout can be performance critical), and presumes the existence of a "sufficiently smart optimizer" that can unwind the indirections to generate code specialized to the instantiation types. Without that, one winds up with inefficient components that invite programmers to not use them.
he D approach is to use so-called "heterogeneous" generics, where generating specialized code upon instantiation is relatively straightforward. Furthermore, this specialized code instantiation is composable components can be layered on top of each other without concern about performance.
There is also the issue of manageability. Does a component design rely on incidental scaffolding, tricks, exploits, and cautions? Is there a lot of boilerplate code surrounding the meat? At the very least, the use of the components should be straightforward enough that a relative newcomer to the language can get it to work, and get it to generate quality results. I've had enough of code that looks simple and elegant in a tutorial, only to find out it doesn't perform well, and that "experts don't write code that way," and I'm sure you have, too.
Finally, I should note that there's an enormous amount of history behind component programming and a large body of work written about software component programming, as well as a smorgasbord of ideas about what components should be. These include many good ideas not discussed here.
Acknowledgements
My thanks to Andrei Alexandrescu, who came up with many of the ideas and examples here. Additional thanks to David Held, Bartosz Milewski, Eric Niebler, and Brad Roberts for their most helpful comments on improving this article.
Walter Bright is the original designer of the D language and a blogger for Dr. Dobb's.