Channels ▼
RSS

Argument-Dependent Return-Type Variance


November, 2005: Argument-Dependent Return-Type Variance

Matthew Wilson is a software-development consultant for Synesis Software, creator of the STLSoft libraries, and author of Imperfect C++ (Addison-Wesley, 2004). He can be contacted at http://imperfectcplusplus.com/.


[This month's column is, in part, an extract from Matthew's forthcoming book on STL extension, called Extended STL, which will be published by Addison-Wesley in 2006.]

Q: How do you double a function's potential return value semantics?

A: ARV it!

This month's installment takes the column on a sideways tack from its major remit, the investigation of integrating different languages and technologies with C and C++, into looking at how other languages—in this case, Python and Ruby—might be seen to influence the idioms of C++. I'll be looking at how a type may act as both an array and an associative array/dictionary/map, which requires that different functions of the same name return different types based on the type(s) of their argument(s). Now, you're probably thinking at this point, Well Duh! That's just overloading. And so it is. But there's more to it than that, and that's our story.

Borrowing a Jewel from Ruby

Consider the following Ruby code, which uses the Open-RJ/Ruby mapping:

# Open the database in the given file
db = OpenRJ::FileDatabase(
  'pets.orj', OpenRJ::ELIDE_blank_RECORDS)

# Access the first record
rec = db[0]

# Print the fields in this record
(0 ... rec.numFields).each \
{ |i|
    fld = rec[i]
    puts "Field#{i}: name=#{fld.name}; 
	           value=#{fld.value}"
}

That's pretty regular Ruby, and quite typical of code using the Open-RJ/Ruby mapping. (I could have more easily used each_with_index, but this way suits my purposes a little better.) Using the pets sample database that comes with the Open-RJ distribution, this prints out:

Field0: name=Name; value=Barney
Field1: name=Species; value=Dog
Field2: name=Breed; value=Bijon Frieze 

Consider now that rather than accessing the fields by index, we access them by name:

. . . # as before

# Print the fields in this record
puts "Name="    + name=rec["Name"]
puts "Species=" + name=rec["Species"]
puts "Breed="   + name=rec["Breed"] if 
                   rec.include?("Breed")

This style is more useful when you have an expectation as to the contents of the record, since there's no explicit testing involved, and it serves as a verification of the structure, throwing an exception if a given field does not exist [1]. This prints out:

Name=Barney
Species=Dog
Breed=Bijon Frieze

Look again carefully at the two uses of the subscript operator. In the first case, the argument to the operator is an integer, and in the second case it is a string. Note the return types associated with these different calls. In the case of an integral argument, the return value is a Field instance. Thus, the record has acted like an array. In the case of a string argument, the return value is a string (representing the value member of the named Field). Thus, the record has acted like an associative array (also known as map, hash, or dictionary). When appropriate, this duality is a remarkably useful facility. It's appropriate in the case of Open-RJ records because Open-RJ database contents are immutable, fields are represented as name+value string pairs, and arrays of pointers to field structures are maintained in each record structure. Indeed, any database/recordset analogy lends itself to this technique to improve data accessibility.

This functionality is implemented in the Open-RJ/Ruby mapping (written in C) via the Record_subscript() function, and its two worker functions Record_subscript_string() and Record_subscript_fixnum() (see Listing 1). If the index argument is a string (T_STRING), then Record_subscript_string() is invoked and either returns a string representing the value of the named field, if found, or fails. If index is an integer (T_FIXNUM), then an instance of the field at the given index is returned if within range, or fails. If index is another type, then a TypeError is raised to the caller.

Dual-Semantic Subscripting in C++

I wanted to emulate this in C++. A simplistic form of this would be as follows:

class Record
{
public:
  . . .
  const Field  operator [](size_t 
     index) const;
  const String operator [](char const 
     *name) const;
};

(The String, Field, and Record types are all lightweight C++ wrappers for the underlying Open-RJ C-API types ORJStringA, ORJFieldA, and ORJRecordA, shown in Listing 2.)

This works well, up to a point:

Record r;

r["Species"]; // Returns value (a String) 
	      // of field named "Species"
r[1];         // Return second field 
	      // instance (a Field)

Alas, there are several drawbacks to this approach. First, consider what happens in the following case:

r[0]; // Compile error

The problem is that literal 0 is just as convertible to an integral type that is not int, as it is to a pointer type. We might solve this by changing the integral form to use int, but then we have the possibility of negative indices, which are not meaningful with an Open-RJ record. (I'm not going to get into the debate about the notion that use of signed integers is always preferable, as it avoids C/C++'s occasionally surprising integral conversions; I just use high warning levels.)

class Record
{
public:
  . . .
  const Field  operator [](int index) const;
  const String operator [](char const *name) const;
};

In any case, there's a much bigger issue here. The only string type with which the other overload is compatible is a C-style string (char const*)—those of you familiar with my predilection for generalized programming by manipulation of types by what they do, rather than what they strictly are, will not be surprised that I don't find that at all satisfying. Maximally flexible classes should work with all string types, not just char const* versus std::string const&.

Generalized Compatibility Via String Access Shims

The Shims concept (described in my article "Generalized String Manipulation: Access Shims and Type Tunneling," from the August 2003 issue of CUJ, and in Chapter 20 of Imperfect C++) defines a mechanism for generalized manipulation of types with incompatible interfaces but that have the same, or similar, logical types, or may be meaningfully converted into the same type. The most obvious, and widely used, are the string access shims, which are a suite of five related shims: c_str_data, c_str_len, c_str_ptr, c_str_size, and c_str_ptr_null. In this case, we are concerned only about the c_str_ptr shim, which is an unbounded set of overloaded functions, named c_str_ptr() and in the stlsoft namespace, that return char const* or an instance of a type implicitly convertible to char const*. Members of the c_str_ptr shim from the STLSoft libraries include the following:

// From stlsoft/string_access.hpp
char const *c_str_ptr(char const *);
char const *c_str_ptr(std::string const &);
char const *c_str_ptr(stlsoft::basic_simple_string<char> const &);
char const *c_str_ptr(stlsoft::basic_string_view<char> const &);

// From winstl/time_string_access.hpp
shim_string<char> c_str_ptr(FILETIME const &t);
shim_string<char> c_str_ptr(SYSTEMTIME const &t);

// From unixstl/string_access.hpp
char const *c_str_ptr(struct dirent const *d);
char const *c_str_ptr(struct dirent const &d);

// From mfcstl/string_access.hpp
c_str_ptr_CWnd_proxy c_str_ptr(CWnd const &w);

Other libraries, including Open-RJ, also define string access shims for their types that can be meaningfully represented as strings (and "export" them, via using declarations, to the stlsoft namespace), such as:

// From openrj/openrj.h (which does not include any STLSoft headers!)
char const *c_str_ptr(ORJStringA const &s);
char const *c_str_ptr(ORJRC rc);
char const *c_str_ptr(ORJ_PARSE_ERROR pe);

We can rewrite the named subscript operator of Record to work with any type for which the c_str_ptr string access shim is defined:

class Record
{
public:
  . . .
  const Field  operator [](size_t index) const;
  const String operator [](char const *name) const;
  template <typename S>
  const String operator [](S const &name) const
  {
    return operator [](c_str_ptr(name));
  }
};

and access named field values with a multitude of types, as in:

std::string s("Species");
CWnd const  &wnd = get_some_window_or_other();
FILETIME    ft = . . .;

r[s];
r[wnd];
r[ft];

(C++ is sometimes accused of having hidden costs. Note that shims do not have any costs over and above what would be required to elicit the string form in handwritten code, even in cases where a conversion is applied. But being efficient and powerful does have its costs: I couldn't imagine programming without them now!)

A Fly In the int-ment

The picture's not quite complete. Looking again at the definition of Record, we see that we have three overloads of the subscript operator. If the argument is char const* or size_t (or int, if we'd elected to use that form), then the requisite nontemplate overload is selected. If the argument is any other type, then the string access shim wielding template overload is selected. This causes a problem if the argument is a different integer type, as in:

long l = 1;

r[l]; // Error: no c_str_ptr() overload matches 'long'

Naturally, having a long interpreted as something convertible to a field name string is quite against the intent of the Record subscript operators. One way of fixing this is to define nontemplate overloads for all integral types (see Listing 3).

Not exactly a pretty picture is it? Lots of repeated code, and ugly preprocessor discrimination to boot [2]. There has to be a better way, and so there is. What's required is a way for the compiler to "react" to an integral argument type by selecting the integral indexing operator, and to use the string lookup operator for everything else.

We can't simply add a member function template for handling integer types:

  template <typename I>
  const Field operator [](I const &index) const
  {
    return operator [](static_cast<unsigned int>(index));
  }

This is because we already have one for the strings, and the compiler would be understandably confused. We need to join the behavior of the two into one. This would be straightforward if they had the same return type, but since they don't, a dash of TMP (Template Meta Programming) is called for.

Selecting Return Type and Overload

We need to select the right overload and select the right return type, which is achieved by combining two TMP techniques: type detection and type selection. The type detection—is it an integer?—is performed by the is_integral_type template, whose member constant value is nonzero when the template is specialized with an integral type, or zero otherwise:

assert(0 != is_integral_type<int>::value);
assert(0 == is_integral_type<char const*>::value);

Specializations of is_integral_type also define a member type type to either the metaBoolean types yes_type or no_type.

The type selection is performed using the select_first_type_if template, whose member type type is the first type parameter when its (third) Boolean parameter is nonzero, or is the second type parameter when zero:

template< typename T1
        , typename T2
        , bool     B //!< Selects T1
        >
struct select_first_type_if
{
  typedef T1  type;   
};

template< typename T1
        , typename T2
        >
struct select_first_type_if<T1, T2, false>
{
  typedef T2  type;
};

Hence, the return selection looks like:

template <typename T>
typename select_first_type_if<Field
                            , String
                            , is_integral_type<T>::value
                            >::type

Overloads of a private worker method, subscript_operator_(), are defined as follows:

  template <typename S>
  String subscript_operator_(S const &name, no_type) const
  {
    return operator [](c_str_ptr(name));
  }
  template <typename I>
  Field subscript_operator_(I const &index, yes_type) const
  {
    return operator [](static_cast<size_t>(index));
  }

They are selected, by overload, within the implementation of the subscript member function template via a temporary instance of the type member type of is_integral_type:

  template <typename T>
  typename select_first_type_if<Field
            , String
            , is_integral_type<T>::value
            >::type operator [](S const &name) const
  {
    typedef typename is_integral_type<T>::type   yesno_type;
    return subscript_operator_(name, yesno_type());
  }

Note the unavoidable duplication of the is_integral_type specialization: in the method signature for deducing return type and in the method body for selecting the worker function overload.

Conclusion

And that's it! I call the technique Argument-dependent Return-type Variance (ARV) because the return type of a function is not selected by the author of the code, but rather by the compiler on behalf of the user. The features of ARV are: avoiding fatuous language and warty ambiguities; being able to truly overload on concept without having to write a large number of identical overload method bodies; and being able to deduce a return type of a member function template based on argument type. To be sure, this is a nontrivial amount of TMP to write or digest, and is not to all tastes (including those of this author). But it leads to extremely efficient, flexible, and capable library code, supporting a minimum of fuss in client code, and so is worth the complexity. There are no runtime costs. This technique was introduced into the Open-RJ/C++ mapping in Version 1.3.2 of Open-RJ.

Note that we could add constraints to the integral overload if we wanted to proscribe the use of certain types:

  template <typename I>
  Field subscript_operator_(I const &index, yes_type) const
  {
    STLSOFT_STATIC_ASSERT(0 == is_signed_type<I>::value);
    return operator [](static_cast<size_t>(index));
  }

Such constraints are applied after the type has been detected to be integral, so there's no chance of it falling through into the string-side of things. Additional discrimination and/or constraints may be added to refine the selection of return-type and/or overload as required.

Three final points to bear in mind: First, there are no failure modes for the subscript method selection in and of itself; everything that is not an integral type should be directed to the shim-using string overload. Whether or not shims are defined for the parameterizing type is a separate matter, part of the "idiomology" of the shims concept. Second, there's nothing to prevent the technique being used to support three or more return type "groups." Finally, it's possible to use void as a return type, to proscribe a particular set of return types from compatibility.

Recls Efficiency Recap

Since last time, I've enhanced the WinSTL basic_findfile_sequence facade class template, so that it, too, performs multipart pattern searching within its iterator class. This has resulted in a further drop of about 5 KB from the size of the binary code, and an updated version of recls (1.6.4) is available from http://recls.org/downloads .html. Notwithstanding, the much improved recls2 project will still be coming out and nipping at its heels in a few months' time, once I've got my next book—Extended STL—out of the way.

Acknowledgments

Thanks to Bjorn Karlsson for helping me avoid nomenclatural ignominy by suggesting a meaningful name for the technique, and for telling me it's "nice." Thanks to Garth Lancaster for pointing out where my vague and meandering tale meandered vaguely beyond mortal tolerance.

References

  1. If you wanted to supply a default value for the optional field Breed, that's easily done with the following:
  2. puts "Breed="   + name=rec["Breed", "Mongrel"]
    
    

  3. If you want to understand the full mess involved with integral type conversions and overloads over a broad range of C++ compilers, check out Chapter 14 of Imperfect C++ (Addison-Wesley, 2004).

CUJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.