Channels ▼
RSS

Parallel

C++ and format_iterator


Matthew Wilson is a software development consultant and trainer for Synesis Software who helps clients to build high-performance software that does not break, and an author of articles and books that attempt to do the same. He can be contacted at matthew@synesis.com.au.


One of the subjects covered in my book Extended STL, Volume 1 was a discussion about the lack of flexibility and expressiveness in the C++ standard library's std::ostream_iterator, along with a (partial) resolution in the form STLSoft's stlsoft::ostream_iterator. Chapter 34 addressed these issues in part and was adapted as a Dr. Dobb's article entitled An Enhanced ostream_iterator.

The deficiencies of std::ostream_iterator for which stlsoft::ostream_iterator provides improvements are:

  • It allows for the specification of a prefix to be emitted with each element in the streamed sequence; std::ostream_iterator provides only for a suffix.
  • It allows the prefix (and the suffix) to be of arbitrary string type; std::ostream_iterator's suffix must be a C-style string.

But issues of flexibility and expressiveness remain, for which neither component provides solutions:

  • Output must still be to an IOStreams output stream (or a structurally conformant class providing insertion operators).
  • The type of the element must be explicitly specified, which can be cumbersome when dealing with template specialisations and/or deep namespaces.
  • The only "formatting" supported is to specify prefix and postfix; anything more sophisticated requires hand-written loops.

In the last few years my research into flexibility has progressed, leading to the application of shims and type-tunneling into areas of C++ programming such as diagnostic logging and string formatting. It is the latter case I want to draw on for this article, specifically the FastFormat C++ formatting library.

FastFormat offers two APIs, Format and Write, supporting replacement-based and concatenation-based formatting of arguments of arbitrary type to output "sinks" of arbitrary type, as in:


int         i =   13;
CComVariant v(L"abcd");
std::string salutation = "Hello";
Person      person("John", "Smith");

// gives: "i=13, v=        abcd        ."
ff::fmtln(std::cout, "i={1}, v={0,20,,^}.", v, i);

// gives: "Hello, John Smith"
std::string result;
ff::write(result, salutation, ", ", person);

The high flexibility afforded by the library got me to thinking once again about formatting output iterators: Could it be possible to implement an iterator in terms of FastFormat that would substantially improve on stlsoft::ostream_iterator?

fastformat::format_iterator

Fairly obviously (since I'm writing this article), the answer is "Yes!" The resulting component -- fastformat::format_iterator -- allows algorithmic formatting:

  • Of sequences of arbitrary type; all types that are compatible with FastFormat's Format and Write APIs will be automatically compatible with the iterators
  • Without having to explicitly write the type in the expression
  • To sinks of arbitrary type
  • Using formats of arbitrary complexity, including use of other parameters from the expression context

Let's look at a scenario to illustrate. We'll consider again (a chopped down version of) the example used in Extended STL, Volume 1, but this time using the recently released version 1.9 of the recls library.

Consider that we want to list all the C/C++ header files under the current directory, and place them within XML tags. In each case we will instantiate an instance of the STL Collection sequence type, recls::search_sequence, and write to a std::stringstream instance, as follows:


std::string const prefix("<file>");
char const* const suffix = "</file>";

std::stringstream      stm;
recls::search_sequence headers(".", "*.h|*.hpp|*.hxx", recls::RECURSIVE);

To output this list using std::ostream_iterator requires something like the following:


// via std::ostream_iterator

if(headers.begin() != headers.end())
{
  stm << prefix;
}
std::copy(
  headers.begin(), headers.end()
, std::ostream_iterator<recls::search_sequence::value_type>(
    stm
  , (suffix + prefix).c_str()));
if(headers.begin() != headers.end())
{
  stm << suffix;
}

Running this gives output like:


<file>H:\freelibs\b64\1.4\include\b64\b64.h</file><file>H:\freelibs\b64\1.4\include\b64\implicit_link.h</file><file>H:\freelibs\b64\1.4\include\b64\b64.hpp</file><file>H:\freelibs\b64\1.4\include\shwild\implicit_link.h</file><file>H:\freelibs\b64\1.4\include\shwild\shwild.h</file><file>H:\freelibs\b64\1.4\include\shwild\shwild.hpp</file><file>H:\freelibs\b64\1.4\include\xcontract\implicit_link.h</file><file>H:\freelibs\b64\1.4\include\xcontract\xcontract.h</file>. . . // more output

Contrast this with the (nearly) equivalent implementation using stlsoft::ostream_iterator:


// via stlsoft::ostream_iterator

std::copy(
  headers.begin(), headers.end()
, stlsoft::ostream_iterator<recls::search_sequence::value_type>(
    stm
  , prefix
  , suffix));

What in the former case required two conditionals and three statements is now encapsulated in a single statement. There's no longer any need to concatenate prefix and postfix in order to obtain a C-style string: stlsoft::ostream_iterator is implemented in terms of String Access Shims and so works with arbitrary string types. Furthermore, there's no longer any need to (conditionally) insert a leading prefix or to add a final suffix.

In fact, the version using std::ostream_iterator is actually defective. If we look at the end out the output, it's clear how:


. . . <file>H:\freelibs\b64\1.4\src\shwild\shwild_vector.hpp</file><file></file>

For any non-empty sequence of files, there's always an extra, empty <file></file> pair. It's possible a consumer may be able to ignore empty items, but this is not the way to write good software.

With stlsoft::ostream_iterator this does not occur, because prefix and postfix are applied to each element directly (see An Enhanced ostream_iterator.

It's also quite clearly more expressive and flexible. But it still has issues:

  • The (ungainly) name of the sequence's value type must be explicitly specified
  • The sequence can be written only to an IOStream's output stream (or an instance of any type that has conformance with it)
  • The "formatting" can consist only of prefix and suffix

Let's look at examples that address the first two of these; the third we'll cover later as we look at the evolution of the component. Here's how we might use fastformat::format_iterator:


std::copy(
  headers.begin(), headers.end()
, fastformat::format_iterator(
    stm
  , prefix + "{0}" + suffix));

We do not need to specify the sequence value type: FastFormat's type inference mechanisms handle this for us. And the flexibility does not end there: We can skip the string stream and write directly to a string.


std::string result;
std::copy(
  headers.begin(), headers.end()
, fastformat::format_iterator(
    result
  , prefix + "{0}" + suffix));

With equal ease, we can output to Windows' OutputDebugString() API function, illustrating the capability to write to arbitrary sink types:


std::copy(
  headers.begin(), headers.end()
, fastformat::format_iterator(
    fastformat::to_sink(::OutputDebugString)
  , prefix + "{0}" + suffix));

In both these last two cases, we're doing that for which we earlier criticised the version using std::ostream_iterator: explicitly preparing the format string. We'll see later how we can do better, after looking at the implementation.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video