The Specifics of the Model
Going back to what we want from a component, let's get into specifics. What kinds of sources are there?
- Streams: Streams provide data sequentially, generally with no history of what came before, and no knowledge of what comes after. Streams live in the now. Files typically present themselves as streams, as is the input from your keyboard, data coming from a network socket, data flowing from a sensor, etc.
- Containers: Containers are fixed collections of data that have at last one method of iterating through all elements. Typical containers would include arrays, hash tables, trees, and linked lists.
- Generators: Generators are algorithms that produce a stream of data. Examples of generators would be random number generators, iotas, prime number generators, producing successive digits ofpi, and so on. Generators have a known starting point (a seed), and sometimes can produce an infinite succession of values.
- List comprehensions: A source passed through an algorithm (such as a filter for list comprehensions), is also a source.
What kinds of algorithms are there?
- Filter: A filter takes its input, and produces some subset of that for its output. Examples include selecting only odd values or prime values, searching for regex matches, validating the data, or passing only employees with a security clearance.
- Map: Maps take input data, transform it, and output the transformed data. Examples include capitalizing words, squaring numbers, and sorting the input.
- Reduce: Reduce consumes its input and produces as output a single value. Examples include summing all the data, finding the maximum, and so forth.
These are just a few examples of many. There are myriad algorithms, each with its own personality. Sorting, searching, pattern matching, encryption, compression, and more come to mind, not to mention a great number of numeric algorithms. Notably, algorithms have specific demands on the layout of the data they operate on. Filter, map, and reduce all operate with simple streams, but many algorithms need more sophisticated data access patterns.
And sinks:
- Streams: writing to files, the console, a network socket, etc.
- Containers: the container gets built from the data.
Forming a table with three columns, we get:
| Source | Algorithm | Sink |
file |
sort |
file |
tree |
filter |
tree |
array |
map |
array |
socket |
reduce |
socket |
list |
max |
list |
iota |
search |
|
random numbers |
odd |
word count |
A perfect component design should be able to take any item from the Source column, apply any Algorithm, and output to any Sink, even if those Sources, Algorithms, and Sinks were developed independently by groups that have never heard of each other. Furthermore, they need to be able to be snapped together (that is, be composable) like:
file → sort → odd → list
for any combination that makes sense.
Some data format transformation may be in order. For example, quicksort needs random access, something a streaming file doesn't offer reasonably well. So we'd need to insert an adapter component that transforms an input stream into an array, leading to:
file → make_array → sort → odd → list
And, of course, we won't accept performance compromises. C++ STL showed that component design can be just as efficient as hand-built non-component code, and often turns out to be even faster. Any component design based on indirection (that is, virtual dispatch), boxing, or dynamic typing is going to have a harder time delivering good performance. (After all, giving users motivation to replace the abstraction with hand-coded equivalents when they 'really mean it' is a sign that something went wrong in the abstracting process. Components must work for industrial code.)
Summing up what is needed for a modern, excellent component design:
- Ability to "snap together" independently developed sources, algorithms, and sinks; that is, composability.
- Strong support for encapsulation.
- Ability to generate code as efficient as hand-coded versions.
- A natural syntax that looks like source → algorithm → sink
- Ability to work with types that are not known in advance.
Being a more recent programming language, D enjoys the benefit of standing on the shoulders of giants. Fundamentally improving component programming has enormous potential to change and significantly improve the way we write software. Here is how the D standard library does it.
Sources
What are the minimum requirements for a source?
- Determine if there is input data available
- Read the current input datum
- Advance to the next input datum
To satisfy these requirements, an InputRange is defined as having the following:
- A property:
bool empty; which returns true if there is no more input, and false if there is. - A property:
E front; which returns the current input datum of typeE. Input ranges are parameterized by their element type (denoted here asE). - A method:
void popFront(); which advances to the next input datum.
Note that this implies a one-element buffer because client code may call front multiple times without moving to the next element. There exists an even simpler notion, the unbuffered range; it only enables a modicum of algorithms and currently it is not formally abstracted in D's standard library.
A very interesting thing about an InputRange is it isn't itself a type. A type is an InputRange if it has the primitives empty, front, and popFront. This makes an InputRange a concept, rather than a type.
Since we all know C file I/O so well, here's an InputRange that reads characters from stdin:
private import core.stdc.stdio;
struct StdinByChar {
@property bool empty() {
if (hasChar)
return false;
auto c = fgetc(stdin);
if (c == EOF)
return true;
ch = cast(char)c;
hasChar = true;
return false;
}
@property char front() {
return ch;
}
void popFront() {
hasChar = false;
}
private:
char ch;
bool hasChar;
}
It has the expected one element buffer and a flag that keeps track of it. The use of it is straightforward, here's code to print it to stdout:
for (auto r = StdinByChar(); !r.empty; r.popFront()) {
auto c = r.front;
fputc(c, stdout);
}
It's pretty clear that anything with those three primitives can be expressed in a for loop the same way. For example, the D foreach statement can automatically do the equivalent with:
foreach (c; StdinByChar())
fputc(c, stdout);
Look, Ma, no types! Already, we're on to something powerfully useful. The InputRange I just described is streaming only. One pass streams are hardly sufficient for complex processing. Let's extend it.
ForwardRange
ForwardRange. It adds a property
@property R save;
Where R is the ForwardRange type. save returns a copy of the ForwardRange in its current state. It does not make a copy of the data being iterated over, only the position. Hence, the original and the copy can now traverse the ForwardRange independently of each other. A singly linked list is a typical forward range.
A ForwardRange is useful when look-ahead is needed. Algorithms that require a ForwardRange are brute force subsequence searching, and generally any pattern matching that needs to do multiple passes over its inputs, are good examples. Merge sort and bring to front also come to mind.
Note that a file stream usually cannot efficiently be a ForwardRange; the way the buffering is done precludes it.
BidirectionalRange
The next extension is for a bidirectional range, and two additional primitives are required:
@property E back; void popBack();
These work analogously to front and popFront(), except they work their way backwards from the end of the range rather than the beginning. An example of a bidirectional range would be a doubly linked list. But doubly linked lists are not used all that often. What makes bidirectional ranges worth studying is that UTF8- and UTF16-encoded strings are, in fact, bidirectional ranges. Bidirectionality enables a few more efficient algorithms, such as finding the last occurrence of a sequence in another.
RandomAccessRange
RandomAccessRangeoverloads the [] operator so it can be indexed to refer to the nth element:
E opIndex(size_t n);
In addition to indexing, there are two options for the RandomAccessRange's functionality:
- A
BidirectionalRangethat offers the length property or is infinite:@property size_t length; - A
ForwardRangethat is infinite. (An infinite range always returns false for the empty property.) More details aboutInputRanges can be found at dlang.org.
Sinks
Sinks in D are called OutputRanges. An OutputRange is nothing more than a type that supports a method called put:
void put(E e);
The idea is that element e of type E gets "put" into the range. (Some other variations on an OutputRange are detailed at dlang.org.)


