Three [mu]Ideas

By Andrei Alexandrescu, February 01, 2005

From mandatory error codes to dynamic handling of temporaries, Andrei mulls over all kinds of ideas.

February, 2005: Three <img src="http://twimgs.com/ddj/cuj/images/cuj0502i/mu.gif" width="8" height="13">Ideas

Andrei Alexandrescu is a graduate student in Computer Science at the University of Washington and author of Modern C++ Design. He can be contacted at [email protected].

The production time lag of printed publications makes it hard to be a good columnist. For readers' sake, a good columnist needs to fake the joyous mood of December some time in October, and detach from that joyous mood when writing in December for February's column. So if this article seems overly happy for the dreary end of January, you now know why. Anyhow, please allow me to wish only the best in the year that just started: Be good, be happy, and be paid for doing what you love doing anyway.

Mailcontainer

Following this column's duo of articles dedicated to lock-free data structures [4, 5], there have been a couple of interesting e-mails. Marco Dalla Gasperina writes:

do {
   ptr = pMap_;
   pRec->pHazard_ = ptr;
} while (pMap_ != ptr);

int * p1 = new int(), * p2 = new int();
if (p1 != p2) { ... } // fine
foo(p1);              // fine
int* pp1 = &p1;       // fine
delete p1;          // p1 is write-only
if (p1 != p2) { ... } // no!
foo(p1);              // no!
pp1 = &p1;            // still fine

Here's the relevant quote from the Standard (3.7.3. para 4):

If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined. (Footnote: On some implementations, it causes a system generated runtime fault.)

How could that happen? On some architectures, the assumption that pointers are really integers breaks. Pointers can be different enough (in size and structure) from integers to need special registers for manipulation (not only dereferencing but also comparison, computing difference, indexing...). Those registers specialized in holding pointers might be automatically checked by the hardware for validity, as part of the memory-management unit design. As soon as an invalid pointer is loaded into such a specialized register, the hardware checker generates an access violation interrupt. Hence the corresponding rule in the C++ Standard.

In fact, things are even fuzzier since it's unclear what "using" a pointer means in the aforementioned quote. For example, would taking the address (pp1 = &p1) count as a "use" of p1? Intuitively it shouldn't, but when it comes to language definitions, intuition is a bad device. The Standard should define "use" and didn't. Indeed, in 2001 Martin von Loewis submitted a defect report about exactly this problem [8]. Isn't it funny how people sometimes say: "Ah, it's just semantics." Semantics are important!

I don't have an entirely satisfactory solution yet to the problem noticed by Marco. On hardware with dedicated pointer registers, there must be alternate ways to compare pointers for equality. If you see ways in which this problem might be solved portably, send me an e-mail.

Another problem was pointed out by Igor Tandetnik, who writes:

In "Tying WRRMMap's Loose Ends" section, I believe the Update method should perform the same hazard pointer dance as Lookup does. After all, while copying the map, a thread calling Update is clearly a reader, and should take the same measures of protection as the regular reader.

Otherwise, suppose you have a writer thread W1. It grabs a pMap_ pointer and proceeds to copy the map's data into the new map. In the middle of this copying, it's suspended. Writer thread W2 comes along, grabs the map, makes a copy, installs a new map, sees that there are no hazard pointers referencing the old map, and deletes it. Now W1 wakes up and proceeds to copy happily from a destroyed object.

Oh-là-là! Both Maged and yours truly agree that Igor is absolutely right. What writers actually do is read-modify-write, and as such, they count as readers as well. So each writer needs one hazard pointer. (If they did only writing without reading, there would have been no need.) I know, I know, in retrospect it's obvious. Sigh.

Last but not least, I got a stimulating (actually, "whipping" would be a better term) piece of e-mail from no one else than the writer, friend, and mentor extraordinaire Scott Meyers:

Having made several passes through your October CUJ column, I felt that I was ready to tackle your December column. I found the Detective Bullet stuff intrusive, but I'll write that off as your hanging out with Herb too much. However, I was surprised to see that HPRecType::active_ is int instead of bool, that HPRecType::pHazard_ is public, that HPRecType::pHazard_ isn't type-safe (because it's of type void*), and that the discussion of calculating the difference between two sets never mentions std::set_difference. On the face of it, this screams "Bad C++!," but I haven't actually READ the column yet, I've just skimmed over it. I hope that after I've read it, these seeming sins of style will turn out to be technically justified...

Well, yes and no. There's justification for everything:

int is used instead of bool to operate on one word at a time. Otherwise, the compiler might start doing crazy stuff like packing the bool with some adjacent data and get into word tearing and such nasty things. No, sir. We need to use a word-sized flag, so int is it.
HPRecType::pHazard is not type-safe because it doesn't need to be. The only information that hazard pointers carry is that of opaque memory addresses. Therefore, there is no need to adorn hazard pointers with any type information. Proof? There is no cast from void* to some other type. So, by the principle of least commitment, void* is the best type for the job.
HPRecType::pHazard_ is also public to check the attention of our readership, thank you, and to racketeer new reviewers.

Speaking of which, from now on, Scott will be spammed with first drafts of all of my articles. Given Scott's busy schedule, that's sort of a Denial of Service (DoS) attack. And if I convince Herb to do the same—hey, that's pure Distributed DoS (especially considering Herb's writing throughput). Finally, Alessio Marchetti writes:

Reading your Shakespearian novels is always interesting. The December edition of CUJ made me wonder whether implementers of the Standard C++ Library should start thinking about implementing a wait-free version of the containers (at least). What is your opinion?

Heck, I'll publish an e-mail comparing me with Shakespeare no matter what. To answer the actual question, yes, lock-free structures are an obvious addition to the standard containers...once, of course, the appropriate changes to the C++ Standard are made to allow implementing multithreaded anything. Ho-hum.

Ideas

I keep ideas for new articles in a little text file. Some of those ideas are simply too small to account for a full-fledged article. That doesn't make them any less useful, however, for which reason I would like to bundle three of them here. The following Ideas relate to error handling, temporary variables, and implementing constructors.

Idea #1: Mandatory Error Codes

We all know just how easy it is to ignore an error code returned by a function:

enum UsefulErrorCode { ... };
UsefulErrorCode FallibleFunction();
FallibleFunction(); // ignore error code

By a principle so old it's hard to assign paternity to, the default is backwards, because writing the wrong thing is easy, and writing the right thing (testing for the error code and so on) is hard. In this code, the compiler is just a silent accomplice to wrongdoing because it doesn't force the programmer to check FallibleFunction's result. In a way, exceptions are attractive because they really blow things up, thus making error handling mandatory instead of optional. That being said, however, exceptions do come with their share of problems. Let's face it, with exceptions, correct code is not easier to write. In fact, exception-correct code is notoriously difficult to write and our community is undergoing a slow process towards embracing idioms that are simple and effective. Besides, testing code that can throw exceptions is problematic because, almost by definition, exceptions do not occur under normal circumstances, so you'd need to inject them somehow to make the test suites cover exceptional paths.

Mandatory error codes fit in between the too-low-key-to-be-useful error codes and the sometimes-too-radical exceptions. Sometimes, as you are designing a library, you do want to return an error code from a function, and you do want to make sure that user code checks it, but you don't want to go as extreme as throwing an exception. In that case, you can do this:

template <class T>
class ErrorCode {
   mutable bool read_;
   T code_;
public: 
   ErrorCode(const T& code)
     : read_(false), code_(code) {}
   ErrorCode(const ErrorCode& rhs) {
     : read_(rhs.read_), code_(rhs.code_) {
       rhs_.read = true;
      }
   }
   operator T() {
     read_ = true;
     return code_;
   }
   ~ErrorCode() {
      assert(read_);
   }
};

The whole idea of ErrorCode is to embed some information (T) and track whether that information has been accessed. If the ErrorCode object goes out of existence without its information having ever been queried, it signals that as an anomaly and raises an assertion in its destructor. Very simple. The only subtlety is to figure out that copying out an ErrorCode<T> means reading it, so the copy constructor sets the read_ flag of the source to true. For that exact reason, ErrorCode<T>::read_ must be mutable-qualified.

Let's see ErrorCode at work. Assume for now that we use a bare int to convey error information.

ErrorCode<int> FallibleFunction();
int result = FallibleFunction();	// good
if (FallibleFunction()) {		// good
   ...
}
FallibleFunction();		// assert!

So you can use FallibleFunction with no problem, as long as you read the code it returned. If you don't, the runtime assertion will remind you. Unlike an exception-based approach, the assert will fail even on a normal, error-free execution. This makes it easier for unit tests to check for intended usage—all they have to do is to call the function, not make it fail. Of course, it would be even nicer if intended usage could be detected at compile time. I couldn't find a way yet; if you do, let me know.

While using this approach, there have been cases in which I really truly honestly didn't care about an error. This is the case, for example, when shutting down an application that has some connections open. It's courteous to try to close the connections properly (instead of leaving that to the system or to some timeout mechanism on the other end), but nobody cares whether closing fails—when the tree falls, who cares for the bushes underneath? We're not lacking solutions; for example:

!FallibleFunction();
(int)FallibleFunction();
int dummy = FallibleFunction();

What we're lacking is a nice, expressive solution that also works with other types than int—and let's not forget, some other types might be more expensive to copy around, so we really should avoid a "copy-and-forget" solution. That's why it's useful to define an ancillary type that simply shuts off the result:

class IgnoreError {};
template <class T>
class ErrorCode {
   ... as before ...
public:
   operator IgnoreError() {
      read_ = true;
      return IgnoreError();
   }
};

Now, code that just wants to call a function without caring about its result can do that with the expressive syntax:

(IgnoreError) FallibleFunction(); // ok
IgnoreError(FallibleFunction());  // ok

Idea #2: Dynamic Detection of Temporaries

One notorious inefficiency of C++ is the way it manipulates temporary values of user-defined types. You know the drill: Temporaries come, are copied, and go at the drop of a hat, and the class implementer doesn't have a good handle on how to optimize away unnecessary duplication of expensive resources.

There has been much work to address these problems: expression templates [7], MOJO [3], Boost [1], and most interestingly, proposals to change the Standard [2, 6]. Changing the language, of course, is the most powerful solution, but at the same time, is the slowest in terms of availability and adoption. All other solutions (that yours truly is aware of) have limited portability, caveats, and require some level of programmer cooperation.

Most existing solutions focus on a compile-time solution. However, very likely many people have thought of a runtime solution for branding variables as temporaries. The runtime solution would abandon the idea of using various type adornments to transport the semantic bit "this is a temporary variable," but instead will rely on an actual bit in the temporary object that is set and queried at runtime. Consider, for example, the eternal String class, in which we can afford using a bit to store the temporary information:

class String {
   char * data_;
   unsigned int capacity_;
   unsigned int length_:31;
   unsigned int isTemporary_:1;
   ...
public:
   String& MakeTemporary() {
      isTemporary_ = true;
      return *this;
   }
};

For exposition purposes, this article uses bit fields; in an actual implementation, bit fields are not guaranteed to pack data properly, so maybe an implementation relying on explicit bit manipulation of the length_ field would yield better portable performance.

Now, it's quite easy to use such a miniframework. The copy constructor needs to look at the is_temporary bit and, if that bit is set, cast the constness away from the source and treat it as a disposable object:

String::String(const String& rhs)
    : isTemporary_(false) {
  if (rhs.isTemporary_) {
    String& moveMe = const_cast <String&>(rhs);
    ... move moveMe into *this ...
  } else {
    ... copy rhs into *this ...
  }
}

Conversely, a function that needs to return a value needs to explicitly signal that that value is a temporary:

String SomeFunction() {
   String result;
   ...
   return result.MakeTemporary();
}

There are a number of adornments that could be added on this simple skeleton. The cost of testing that extra bit shouldn't be worrisome, especially if the savings brought by avoiding copying are high. The solution scales to larger objects: If, say, you define a class Employee that has a few strings, you can have it obey the same protocol and define Employee::MakeTemporary that calls String::MakeTemporary on all of its members. Better yet, Employee needn't allocate its own isTemporary_ bit—it can piggyback on the temporary bit of any of its String members.

But, I agree. Perfect? No. Pretty? Not even. Works portably? It does.

Idea #3: Exception-Safe Constructors

Imagine you have a DatabaseConnection class and a Log class, and you want to make a Session class that uses both—for example, Session lets you connect to a database, use it, and close it, while logging all activity by using a Log object. So Session would aggregate a DatabaseConnection class and a Log class. Its destructor might look like this:

class Session {
   Log log_;
   DatabaseConnection conn_;
public:
   ...
   ~Session() {
     if (!conn_.IsOpen()) return;
     log_.write("Closing_connection...");
     log_.write(conn_.Close()
        ? "succeeded!\n"
        : "failed!\n");
   }
}

Such a sophisticated destructor creates quite a conundrum when trying to implement the constructor sensibly. You see, as soon as the DatabaseConnection and the Log are alive and kicking inside the constructor, you'd want the destructor to be invoked such that the database activity is logged correctly. So then you'd need to write code like this:

Session::Session(const char* connName) {
   conn_.Open(connName); // might throw
   try {
      conn_.Prefetch();    // might throw
   } catch (std::exception&) {
     ... duplicate destructor code ...
     throw; 
   }
}

If conn_.Open throws an exception, then the constructor can fail immediately: The database was never opened, so who cares. But as soon as opening succeeded, we'd like to log the closing properly. So the destructor body needs to be executed as soon as conn_.Open succeeded, but that's not the case if conn_.Prefetch fails. See the destructor duplication problem? To summarize: Sometimes you want to make sure the destructor will be executed even before the constructor fails at some point. The language, however, commits to not executing the destructor unless the entire constructor has finished successfully. What to do?

The perfect solution to this problem would be a postconstructor, a language feature that comes up every now and then on C++-related newsgroups. A postconstructor would be a member function that is mandatorily called right after an object's construction. Were that feature available, conn_.Prefetch would belong to the postconstruction stage.

A less glorious solution is to simply define a private function Destroy that you then invoke from both the destructor and the constructor's catch handler. A nicer and more elegant solution—and the gist of Idea #3—is to simply delegate from one constructor to another. You can do all the work on the side by using a surrogate object, and then transfer the state of the surrogate object into the object under construction:

Session::Session(const char* connName) {
   Session surrogate;
   surrogate.conn_.Open(connName); // might throw
   surrogate.conn_.Prefetch();     // might throw
   surrogate_.MoveTo(*this);       // transfer state
}

Done! If anything bad happens during surrogate.conn_.Prefetch, then the closing of the connection will still be logged properly because surrogate's destructor will be called. All you need is a function that allows destructively moving the state of one Session object to another, which is a very useful function to have anyway.

It doesn't take more than a few seconds to realize that this is a variant of the ages-old technique of implementing operator=: Build an object on the side and then move it into *this. The difference is that you're in the constructor and you're using another constructor, which has an interesting chicken-and-egg feel. You conveniently use objects to ensure proper release of resources, and you transfer ownership only when it is convenient to do so. That way, you don't need to duplicate or even factor out destruction code, nor write try statements inside a constructor.

Conclusion

Hopefully, you found ideas #1 (mandatory error codes) and #3 (use of surrogate objects during construction) useful for your daily coding. I have used them both. I haven't used dynamic handling of temporaries (Idea #2), but I believe there is something there, and perhaps it can be refined with your ideas. Let me know, and now turn your soul's clock back for a second and allow me to wish you a wonderful holiday season.

References

[1] Abrahams, David. Boost Move Library (http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/move.hpp).
[2] Abrahams, David and Gary Powell. "Clarification of Initialization of Class Objects by rvalues," JTC1/SC22/WG21, February 2004 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1610.html).
[3] Alexandrescu, Andrei. "Generic <Programming>: Move Constructors," C++ Experts Online, February 2003 (http://moderncppdesign.com/publications/cuj-02-2003.html).
[4] Alexandrescu, Andrei. "Generic <Programming>: Lock-Free Data Structures," C/C++ Users Journal, October 2004.
[5] Alexandrescu, Andrei and Maged Michael. "Generic <Programming>: Lock-Free Data Structures II," C/C++ Users Journal, December 2004.
[6] Hinnant, Howard, Peter Dimov, and David Abrahams. "A Proposal to Add Move Semantics Support to the C++ Language," JTC1/SC22/WG21, September 2002 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n1377.htm).
[7] Veldhuizen, Todd L. "Expression Templates," C++ Report, 7(5):26-31, June 1995. CODEN CRPTE7. ISSN 1040-6042. Reprinted in C++ Gems, by Stanley Lippman, Editor.
[8] Von Loewis, Martin. "'use' of invalid pointer value not defined," C++ Standard Core Language Active Issues, Revision 32, November 2004 (http://www2.open-std.org/jtc1/sc22/wg21/docs/cwgactive.html#312).

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Three [mu]Ideas

Mailcontainer

Ideas

Idea #1: Mandatory Error Codes

Idea #2: Dynamic Detection of Temporaries

Idea #3: Exception-Safe Constructors

Conclusion

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Three [mu]Ideas

Mailcontainer

Ideas

Idea #1: Mandatory Error Codes

Idea #2: Dynamic Detection of Temporaries

Idea #3: Exception-Safe Constructors

Conclusion

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content