Dr. Dobb's | The Standard Librarian: IOStreams and Stdio

The Standard Librarian: IOStreams and Stdio

You can mix C and C++ I/O operations, but you have to be careful if you don't want garbled streams.

November 01, 2000
URL:http://www.drdobbs.com/the-standard-librarian-iostreams-and-std/184401305

November 2000/The Standard Librarian

I/O is inherently complicated, and the C++ Standard library reflects that complexity. The C++ Standard defines ten I/O headers, dozens of I/O classes, and hundreds of member and nonmember functions. Some of those functions have obvious uses, but others are more obscure. Sometimes even the obscure member functions turn out to be important; one of those is std::ios_base::sync_with_stdio. What does sync_with_stdio do, and why would you ever want to use it?

Synchronization

The C++ Standard (§27.4.2.4) says that std::ios_base::sync_with_stdio, which takes a single argument of type bool, controls whether "the standard iostream objects are synchronized," that "called with a false argument, it allows the standard streams to operate independently of the standard C streams," and that its return value is the previous synchronization state. That's not very informative; what does it mean for iostream objects to be synchronized?

First, a brief review. The C Standard library has one mechanism for I/O: stdio streams. There are three predefined streams in the C Standard library, stdin, stdout, and stderr, and you manipulate them with functions like getc and fprintf. The C++ Standard library has two mechanisms for I/O: in addition to stdio, it also defines a newer mechanism called iostreams. There are eleven (!) predefined streams in the C++ Standard library: the three streams from the C library; three new-style streams, cin, cout, cerr, and a fourth stream, clog, that's just like cerr except for its buffering policy; and four streams for wide characters, wcin, wcout, wcerr, and wclog.

Stdio streams and iostreams are both part of the C++ library, and it's entirely reasonable for both kinds of I/O to be found in a single program. There's no problem if you open one file with fopen and open a different file as an fstream. The predefined streams are more complicated, though, because stdin and cin connect to the same thing (the program's standard input), as do stdout and cout (standard output) and stderr and cerr (standard error). What happens if you try to mix these two methods? What happens, for example, if you read one line from standard input using scanf and then read the next line using std::getline?

In general, you should expect there to be a problem. I/O is nearly always buffered — performance would be dreadful if you had to make a separate operating system call for every character! — and there's no reason to expect two different I/O systems to use the same buffers. (The Perl documentation, for example, warns against mixing read and sysread.) There's a special exception, though: the predefined streams in C++ can be synchronized with their stdio counterparts, meaning that you can freely intermix reads from cin and stdin or writes to cout and stdout.

How freely? The C++ Standard doesn't really say. The intent of the people who wrote this part of the Standard, though, was that they could be mixed very freely indeed: you can, for example, read a character c with getc(stdin), put it back in standard input with std::cin.rdbuf->sputbackc(c), and then read the same character again either from cin or from stdin. This requires a tight coupling between the stdio and iostream buffers.

The standard library sometimes looks very different to a library implementer than to a user! The description of sync_with_stdio is just four inconspicuous sentences in the C++ Standard, but, internally, a library implementation that provides this feature (especially an implementation that provides it without too great a performance penalty) looks quite different from what it would look like if it didn't have to support synchronization. There are several ways of implementing iostreams so that synchronized mode is possible; all of them boil down to sharing a single buffer between an stdio stream and its iostream counterpart, and all of them come at a cost.

In the "classic" iostreams that came with AT&T cfront, unsynchronized mode was the default. That has changed. According to the C++ Standard, synchronized mode is the default. If you've found that your program suddenly became slower when you switched from a compiler that provided "classic" iostreams to one that provides a standard-conforming library, what you're seeing might be nothing more than the change in defaults. Unsynchronized mode is always at least as fast as synchronized, and in some implementations, it's much faster. If you don't actually need to mix stdio and iostream I/O on the predefined streams, you should always put the line

std::ios_base::sync_with_stdio(false);

at the beginning of your program. (And in many cases you should also put the line

std::cin.tie(0);

at the beginning of your program. By default, the I/O library flushes cout every time you read from cin. That's convenient if you write things like this,

std::cout << "Enter a number: ";
std::cin >> x;

because it ensures that your program will display the output before trying to read the input, without your having to flush the output stream explicitly. Just like synchronization, however, the convenience of automatic flushing also comes at a cost. If you don't need automatic flushing, your program will run faster without it.)

Stdio Filebufs

You probably aren't going to be writing your own C++ library implementation, so you might think that you don't need to know anything about how synchronization is implemented. What happens, though, if you want to do something slightly different? Suppose, for example, that you open a file for reading both with iostreams and with stdio:

std::ifstream f1("file.txt");
FILE* f2 = fopen("file.txt");

What happens if you try to read from both f1 and f2?

The C++ Standard doesn't say what happens; it doesn't give any guarantee that this sort of thing will work. In fact, it probably won't. Probably you'll get inconsistent results, seeing some parts of the file twice and others not at all. If you want to view a file with iostreams and stdio at the same time, you'll have to extend the standard library.

If you're unfamiliar with the I/O architecture of the C++ Standard library, you might think that the answer is to create a class that inherits from fstream. It isn't.

Conceptually, I/O has two parts: interpreting a stream of characters in terms of high-level data types, and managing and buffering the physical stream of characters itself. In the C++ I/O library, those tasks are divided into separate classes. Formatting and interpretation is the role of classes like std::istream and std::ostream (with help from std::locale and its facets), and managing the stream of raw characters is the role of streambufs.

The C++ I/O library was designed to be extensible, and streambufs were introduced so it would be easy to solve just the sort of problem we're looking at now: we want to manage that physical stream buffering differently (so that it's synchronized with an stdio file), but we don't want to worry about any of the formatting. The base streambuf class is std::streambuf (a typedef for std::basic_streambuf<char>, but you don't have to worry about the template unless you're using files whose character type is something other than char), and the concrete buffer classes, such as the one that manages file I/O (std::filebuf), are subclasses.

The answer to synchronized I/O with an arbitrary FILE*, then, is to create a new streambuf class that inherits from std::streambuf. Once we have that streambuf (let's call it syncbuf), it will easily fit into the existing library. Each of the classes std::istream and std::ostream has a constructor that takes a streambuf*, so we can use this new buffer class as follows:

syncbuf buf(fptr);
std::istream in(&buf);

Or, if we'd like, we can write a tiny wrapper class to make this usage slightly more convenient:

class isyncstream
   : public std::istream {
public:
   isyncstream()
      : istream(&buf), buf(0) {}
   isyncstream(FILE* fptr)
      : istream(&buf), buf(fptr) {}
   syncbuf* rdbuf() const {
      return &buf; 
   }
private:
   syncbuf buf;
};

Classes like std::istringstream and std::ifstream are just this sort of wrapper.

Writing the syncbuf class itself isn't quite so trivial, but it still isn't very difficult. The std::streambuf class may look formidable, but fundamentally its model of I/O is quite simple. Writing a useful derived class just requires overriding a few virtual functions.

The public interface of std::streambuf consists of member functions like sbumpc (get the current character and move to the next position) and sputc (write one character). Derived classes, however, see a different interface of protected member functions.

The protected interface exposes two arrays of characters: a get area, used for input, and a put area, used for output. Each array is characterized by three pointers: a pointer to the beginning, a pointer to the current position, and a past-the-end pointer. (As with FILE*, we don't use the type system to distinguish between input and output streambufs. If you try to read from a write-only streambuf, or write to a read-only streambuf, you'll get a run-time failure.) The base class initializes those six pointers to NULL; derived classes are responsible for setting them up in whatever way is sensible.

The public member functions (all of which are nonvirtual) operate on those two arrays; sbumpc, for example, returns the current character in the get area and increments the current-position pointer; sgetc returns the current character without incrementing the current-position pointer; and sputc writes a character to the current position in the put area and increments the put area's current-position pointer.

What happens if there is no "current position"? What happens, for example, if the three get area pointers are all null, or if the current position is past the end? That's where the protected interface comes in: in the absence of an available read or write position, the public member functions defer to protected member functions (all of which are virtual) that manage the characters' ultimate source or destination. To create a streambuf subclass that you can use for both input and output, you need to override the virtual member functions overflow, underflow, uflow, and pbackfail.

The C++ Standard has a detailed (and confusing) description of those four member functions, but the basic idea is quite simple: overflow writes one character and empties the output buffer, underflow and uflow read one character and replenish the input buffer, and pbackfail puts a character back onto the input stream. Why two different functions for underflow? The difference is that uflow reads a character and advances the input position, while underflow reads a character without advancing the input position. (This corresponds to the distinction between the public member functions sgetc and sbumpc.)

In our case this is even simpler, because we don't want syncbuf to have its own input and output buffers; we want to use an underlying FILE* for all our buffering. The std::streambuf base class initializes the get area and put area pointers to NULL, and we'll let them stay that way. The public member functions will never find an available read or write position, so they will always fall back on our overridden virtuals. Those, in turn, just delegate to stdio functions. The only slightly tricky part is underflow: there's no stdio function for reading a character without advancing the read position. We can solve that problem, however, by combining getc with ungetc.

The complete syncbuf class is shown in Listing 1.

That's it! There's one more refinement we can make (overriding seekpos and seekoff so that syncbuf will support file repositioning), but the class is already useful even without support for seeking. It's not very fast — it makes a virtual function call for every character — but sometimes, if you're dealing with legacy code, stdio/iostream interoperability is more important than performance.

Summary

For the predefined streams, it's safe to mix stdio and iostreams. For example, you can safely use stdin and cin in the same program; the C++ Standard guarantees that it will work the way you would naively expect it to.
If you don't need to mix stdio and iostreams, you can get better performance by turning this feature off.
If you need to open a file with iostreams, and also access the same file as a FILE*, there's nothing predefined in the C++ Standard library to support that usage. The Standard library is extensible, however, and a class that allows this is just a few dozen lines of code. You shouldn't be afraid of writing a new streambuf when it's appropriate.

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee's library working group. He works at AT&T Labs — Research and can be contacted at [email protected].

November 2000/The Standard Librarian/Listing 1

Listing 1: A streambuf class that enables simultaneous use of a stream with both stdio and iostreams functions

class syncbuf : public std::streambuf {
public:
   syncbuf(FILE*);

protected:
   virtual int overflow(int c = EOF);
   virtual int underflow();
   virtual int uflow();
   virtual int pbackfail(int c = EOF);
   virtual int sync();

private:
   FILE* fptr;
};

syncbuf::syncbuf(FILE* f)
   : std::streambuf(), fptr(f) {}

int syncbuf::overflow(int c) {
   return c != EOF ? fputc(c, fptr) : EOF;
}

int syncbuf::underflow() {
   int c = getc(fptr);
   if (c != EOF)
      ungetc(c, fptr);
   return c;
}

int syncbuf::uflow() {
   return getc(fptr);
}

int syncbuf::pbackfail(int c) {
   return c != EOF ? ungetc(c, fptr) : EOF;
}

int syncbuf::sync() {
   return fflush(fptr);
}