Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Standard Librarian: Streambufs and Streambuf Iterators


March 2001/The Standard Librarian


For simple things, C++ I/O is simple. To send a value to an output stream os, you write

out << "The value is" << x
    << std::endl;

To read a value, you write

in >> x;

I haven’t said what x’s type is, and I haven’t said whether in and out are standard input and standard output or whether they connect to files. That’s intentional; this code works the same way regardless of those details.

If your I/O needs are modest, you may never need to know much more than that. Once you know about the standard streams cin and cout, the >> operator for input, the << operator for output, and the endl manipulator to terminate a line, you’re most of the way there. With a few more details — getline for line-oriented input, the width and precision and setbase manipulators to control how numbers get displayed, and the ifstream and ofstream classes to connect to files — you have something that begins to look like a complete I/O library. At this level, it’s easy to fit a description of C++ I/O into less than a page.

Of course, the C++ Standard takes many pages to describe the standard I/O library; sometimes you need to know more than the capsule summary in the last paragraph. For example, suppose you aren’t working with high-level types like strings and integers, but with individual characters. If you were working in C, you would read a character with getc or fgetc. In C++ your first thought might be that you should read a character by writing

char c;
in >> c;,

but that’s subtly wrong — or maybe not so subtly. It’s the moral equivalent of scanf("%c", &c), not of getc. The >> operator performs formatted input, so, if the input stream contains the characters "a b c", you’ll never see the spaces. (Similarly, you’ll never see tabs or newlines.) Formatted input skips whitespace.

If formatted input is the wrong choice, maybe you should try unformatted input: c = in.get();. That’s not exactly wrong, but if you’re expecting reasonable performance you’ll probably be disappointed. It certainly isn’t the equivalent of getc! While getc is a tiny function that does a little bit of pointer manipulation (it might even be a macro, rather than a function), istream::get is quite complicated — in the implementations I have looked at, it takes several dozen lines of code. If you write (say) a file-copying routine in terms of istream::get, you can expect it to be dreadfully slow. As a general rule, get does not belong in a tight loop.

But now there’s a problem. If you shouldn’t use formatted input and you shouldn’t use unformatted input, then what’s left? The istream class doesn’t have any member functions that are any simpler than get.

If performance is important, you have to go beyond istream and ostream and learn a bit more about how the C++ I/O library is put together. The lower level functions that get and getline and operator>> are built on top of aren’t part of istream, but part of streambuf. The closest equivalent of getc in the C++ library is a somewhat formidable looking construct: in.rdbuf()->sbumpc().

Streambufs

The C++ I/O library is quite complicated, but the general ideas behind its architecture are simple. Formatting decisions are made by locale facets, and character buffering, and transport of characters to and from their ultimate source or destination, is performed by stream buffers. The stream classes themselves, istream and ostream, are surprisingly unimportant. They’re wrappers that tie locales to stream buffers; they contain user formatting flags and some rudimentary state information; and they provide a convenient syntax (the << and >> operators) for simple kinds of I/O.

The most innovative aspect of the C++ I/O library is that data formatting has been decoupled from character manipulation; understanding that decoupling is crucial for anything but basic use of the library.

A stream buffer class is a class that inherits from std::streambuf. It understands only one data type: the character [1]. Interpreting those characters is someone else’s job. For example, a streambuf might tell you that the next three characters are '7', 'F', and ' '. If you want to interpret those characters as the number 127 terminated by whitespace, you have to call another function. The stream classes do that automatically: when you write

int n;
in >> n;,

the istream gets characters from a stream buffer and passes those characters to a locale facet so that they can be interpreted as an int.

Every stream buffer has the same interface, but manages a different kind of I/O or buffering. Thus std::istream contains a pointer of type streambuf* (you write in.rdbuf() to obtain the pointer), but, as usual with pointers to base classes, istream doesn’t have to know the exact type of the object that it points to. For example, the rdbuf pointer might point to a std::filebuf, a stream buffer for file I/O; or to a std::stringbuf, a stream buffer that manages characters in a string; or to a user-defined networking stream buffer.

This may seem complicated and inefficient. If we’re using an interface that’s based on reading and writing single characters, and if we’re accessing that interface by a pointer to a polymorphic base class, does this mean we have to call a virtual function for every character? Fortunately, no. While std::streambuf is a base class, it’s not an abstract base class: its functionality is cleverly divided between virtual and nonvirtual functions, so that streambuf’s interface is simple (much simpler than the words in the Standard make it appear!) and also efficient.

If you’ve ever used the low-level functions in the C I/O library, the stream buffer interface should look familiar to you: you’ll be working with member functions instead of global functions, and the names are slightly different, but the ideas are the same. If p is a streambuf*, then p->sputc(c) writes the single character c; it returns EOF if the write fails, and something else if it succeeds. Input is slightly more complicated, because there are more choices: p->sbumpc() returns the current character (again, it returns EOF if the read fails) and moves to the next read position, p->sgetc() returns the current character without moving to the next read position, and p->snextc() increments the read position and returns the next character. Finally, p->sputbackc(c) "unreads" a character and pushes it back onto the input.

The same type, streambuf, is used both for reading and writing. If a stream buffer is read-only, then writes will always fail; if a stream buffer is write-only, then reads will always fail.

The key point is that all of those functions are non-virtual — in fact, they’re probably just a line or two long and they’re probably declared to be inline. The read functions get characters from an internal buffer declared in the streambuf base class; only when all the characters in the buffer have been consumed does streambuf need to invoke any virtual functions. Similarly, sputc puts character in an internal buffer and invokes the appropriate virtual function whenever that buffer needs to be flushed. We thus have both flexibility and efficiency.

All of streambuf’s virtual member functions are protected. You need to know about them if you’re planning to write your own stream buffer class, but not if all you’re planning to do is use preexisting stream buffers — I’m not even going to mention those virtual functions’ names in this column! To use stream buffers, you just need to know about the member functions I’ve already mentioned: sputc, sbumpc, sgetc, snextc, and sputbackc. For input, you have the choice of reading a character with sbumpc or else reading it with sgetc and then moving to the next with snextc. Which version you use is partly a matter of taste; I tend to find sbumpc more convenient, but the differences are small.

Are there any gotchas to watch out for when you’re mixing low-level streambuf I/O with the high-level stream functions? Not really. You don’t have to worry about buffers or positioning information getting out of sync, because streams have no such information on their own. The biggest issues deal with error reporting, and even those issues are smaller than they might appear. You might worry, for example, that if you work directly with a stream’s underlying buffer, errors won’t be reported back to the stream. Here there is a real concern — streams do keep track of error state — but it’s a minor one. Suppose that in is an istream, and suppose that you keep calling in.rdbuf()->sbumpc() until you encounter EOF. It’s certainly true that in won’t have its end-of-file marker set, but that doesn’t really matter: it’ll get set anyway the next time you try reading from in again.

A slightly more serious concern is that the virtual functions in a class that’s derived from streambuf might throw exceptions; perhaps, for example, a user-defined class for network I/O might throw an exception when the underlying connection has been lost. High-level istream and ostream functions will catch those exceptions and translate them into an error state within the stream (that’s one of the reasons that seemingly innocent functions like istream::get are so complicated), but if you’re working with stream buffers directly there’s nobody to catch exceptions for you. If you work with stream buffers, you should make sure that your code is exception safe.

Streambuf Iterators

I haven’t yet presented any code samples that use streambufs. One reason is that there’s very little new to say: you can use sputc and sbumpc in just the same way as you use putc and getc. Another reason, however, is that I haven’t yet described one last library component that, in many real cases, is the easiest way of working with streambufs.

If you’re reading a character, you’re probably going to read more than one character: character input is mostly important in loops. But a loop where you read one value after another, doing things with each of those values, is a pattern that’s dealt with elsewhere in the C++ Standard library: it’s just what iterators are for. Accordingly, the standard library defines the types istreambuf_iterator and ostreambuf_iterator, which use streambufs to read and write characters. If i is an istreambuf_iterator, then *i returns the current character (just like sgetc), and ++i moves to the next character (just like sbumpc). Upon reaching end of file, an istreambuf_iterator becomes equal to a special end-of-stream iterator that you create with the default constructor. So, for example, you can process all of the characters that you read from a stream by writing a loop that looks something like this:

std::streambuf* p = in.rdbuf();
std::istreambuf_iterator<char> i(p);
std::istreambuf_iterator<char> eos;
while (i != eos) {
   char c = *i; // Do something with c
}

(Actually, this is just slightly more verbose than necessary. We could pass in to istreambuf_iterator directly; there’s a constructor that will call in.rdbuf() for us.)

But, of course, the real value of iterators isn’t that you can use them in loops; it’s that you can combine them with generic algorithms that operate on arbitrary iterator types. You can pass an istreambuf_iterator to any algorithm that accepts input iterators, and you can pass an ostreambuf_iterator to any algorithm that accepts output iterators. If you’re lucky, you may not have to write any loops on your own: someone may already have written a generic algorithm that does what you want. Even if you’re not quite that lucky, you can write your own generic algorithm, using the iterator formalism to separate out the data processing from the mechanics of I/O.

Combining streambuf iterators with other parts of the standard library makes it possible to do some surprisingly sophisticated things in just a few lines. For example, suppose that you need to read the entire contents of a file into memory. If you don’t know the length of the file ahead of time, this can be messy: you have to use dynamic allocation, but you can’t know how much memory to allocate until after you’ve read everything. The best strategy is to allocate a buffer of some arbitrary size, and then expand its size when necessary. But we shouldn’t have to do that explicitly: vector can handle the tedium of memory management for us. We also don’t need to write a loop to read characters: in terms of iterators, the characters we want begin with an iterator that points to the beginning of the file and continue up until the end-of-stream iterator. This is a range of iterators, and vector has a constructor that takes a range of iterators [2]. The code is shorter than the explanation:

std::ifstream in(fname);
std::istreambuf_iterator<char> i(in);
std::istreambuf_iterator<char> eos;
std::vector<char> v(i, eos);

Or, if the vector v already exists and you want to append the contents of a file to it, that’s equally easy: just replace the last line with

v.insert(v.end(), i, eos);

or with

std::copy(i, eos, std::back_inserter(v));

Similarly, it’s easy to copy one file to another, performing character-level transformations along the way: just combine istreambuf_iterator, ostreambuf_iterator, and the appropriate generic algorithm. This snippet, for example, creates a copy of a file in which every space character is replaced with a newline:

std::ifstream in_file(in_fname);
std::ofstream out_file(out_fname);
std::istreambuf_iterator<char> in(in_file);
std::istreambuf_iterator<char> eos;
std::ostreambuf_iterator<char> out(out_file);
std::replace_copy(in, eos, out, ' ', '\n');

Conclusions

The C++ Standard library segregates high-level and low-level I/O operations into different classes: high-level operations in the stream classes, low-level operations in streambuf and the classes that inherit from it. The low-level operations have sometimes had a tendency to get lost, since it’s easy to think of streambuf as just an implementation detail. That’s unfortunate: sputc and sbumpc are no more obscure or complicated than the C library functions putc and getc are. If you would ever consider writing part of your program in terms of putc and getc, you should also consider writing it in terms of stream buffers and streambuf iterators.

You should use stream classes, with the << and >> operator, when you’re working with high-level data types. You should use stream buffers, either directly or through streambuf iterators, when you’re performing character-by-character I/O and when performance matters. Discussions of C++ I/O rightly begin with istream and ostream, but ought not to end there.

Notes

[1] A character isn’t necessarily the same as char. The C++ I/O library is templatized; streambuf isn’t a class, but just a typedef for basic_streambuf<char, char_traits<char> >. The library also provides wstreambuf, which is a typedef for basic_streambuf<wchar_t, char_traits<wchar_t> >, and in principle you can use basic_streambuf for your own character type. For the purposes of this column, parameterization by character type doesn’t matter.

[2] This constructor uses member templates. It’s part of the C++ Standard, but you may find that member templates are poorly supported on some older compilers.

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee’s library working group. He works at AT&T Labs — Research and can be contacted at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.