Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Syntactic Aspartame: Recreational Operator Overloading


February, 2006: Syntactic Aspartame: Recreational Operator Overloading

Sander Stoks is a software engineer who writes image acquisition, processing, and display software for electron microscopes. He can be contacted at [email protected].


Operator overloading is a controversial subject. There are programmers who feel it is merely syntactic sugar with a potential for abuse, and should never have made it into the language. However, you could object that pObject->Foo(x) is merely syntactic sugar for Foo(pObject, x). And while many people would agree that event += observer is less clear than event.Register(observer), many others (especially programmers dealing with scientific applications) would certainly not like to go back to a = add(mul(p, q), r).

In fact, most programmers I know who learned C++ as a "second language" after C, Pascal, or Fortran point to operator overloading as one of the most exciting features of the language. In some cases (including my own), this initially leads to overzealous overloading. The first thing most of us have tried is defining a "power" operator (haven't you?). It quickly turns out you can't get away with defining operator** (some other languages have the ** operator for "raised to the power of...") because this is parsed differently from what we expect. The next thing to try is operator^, which works but leads to unexpected results because of operator precedence rules.

So maybe the initial enthusiasm has cooled a bit, but operator overloading is still nice. Especially considering that a matrix-times-vector multiplication that looks much like you would write it on paper is appealing to programmers dealing with numerical algorithms. Still, they inevitably run into a problem with vectors.

Vectors have two kinds of multiplication, known as the "dot (or inner) product," and the "cross (or outer) product." For three-dimensional vectors, these are defined as:

a . b = a.x*b.x + a.y*b.y + a.z*b.z

a x b = (	a.y*b.z - a.z*b.y, 
	a.z*b.x - a.x*b.z, 
	a.x*b.y - a.y*b.x)

Unfortunately, we don't have "centered dot" or "cross" symbols at our disposal, and C++ doesn't let you define new infix operators. You're stuck with only one operator* for vectors, and you have to pick either the dot product or cross product for your implementation.

To mathmeticians, the operator symbol used for the vector product is often redundant. It is clear from the context which one of the two is meant: If the result appears in a scalar expression, it was the dot product; if it is in a vector expression, it was the cross product. This observation lands us on the slippery slope of Return Type Overloading (RTO), which is even more controversial than operator overloading—and that C++ doesn't support anyway. Or does it?

As the basis of my examples, I use a simple Vector3 type, along with inner() and outer() functions for both types of products:

struct Vector3
{
    Vector3() {}
    Vector3(double x_, 

            double y_, 
            double z_)
    : x(x_), y(y_), z(z_) {}
    double x;
    double y;
    double z;
};
double inner(	const Vector3& lhs, 
	const Vector3& rhs)
{
   return a.x*b.x + a.y*b.y + a.z*b.z;
}
Vector3 outer(	const Vector3& lhs, 
	const Vector3& rhs)
{
    return Vector3(	a.y*b.z - a.z*b.y, 
	a.z*b.x - a.x*b.z, 
	a.x*b.y - a.y*b.x);
}

Since my focus is on multiplication here, I have elided everything else. If you could have code like:

Vector3 a(1, 2, 3);
Vector3 b(7, 5, 6);
	
double dp1 = a*b;
Vector3 cp1 = a*b;

it would be apparent that the former multiplication is meant to be a dot product, and the latter a cross product. The first thought would be to simply define:

Vector3 operator* (const Vector3& lhs, 
                   const Vector3& rhs)
{
    return outer(lhs, rhs);
}
double operator* (	const Vector3& lhs, 
	const Vector3& rhs)
{
    return inner(lhs, rhs);
}

but since this is RTO, you can't. What you can do, however, is use Coplien's Proxy Class pattern. You do not return a Vector3 or a double from your multiplication immediately, but a special "intermediate result":

VectorProduct operator*(const Vector3& 
               lhs, const Vector3& rhs)
{
   return VectorProduct(lhs, rhs);
}

You define this VectorProduct class to be convertible into a Vector3 or a double, depending on how it is further processed:

class VectorProduct
{
    friend VectorProduct 
        operator*(	const Vector3& lhs, 
	const Vector3& rhs);
    VectorProduct(	const Vector3& lhs_, 
	const Vector3& rhs_): 
	lhs(lhs_), rhs(rhs_)    
	{}
    const Vector3& lhs;
    const Vector3& rhs;
public:
    operator double() const
    {
        return inner(lhs, rhs);
    }
    operator Vector3() const
    {
        return outer(lhs, rhs);
    }
};

Note that its constructor is private and only accessible from the operator* because that's the only context it makes sense in. With only these two additions, you have a poor man's RTO. Of course, one of the main problems with RTO is the ambiguity when an expression is used where more than one of its resulting types is valid. For example, cout << a*b makes your compiler complain about ambiguity, and has to be specified with either cout << double(a*b) or cout << Vector3(a*b).

It turns out there is also a way to define new infix operators. So as not to spoil the surprise, I first show the resulting client code:

Vector3 a(1, 2, 3);
Vector3 b(7, 5, 6);

double dp2 = a dot b;
Vector3 cp2 = a cross b;

This looks rather un-C++. It uses Coplien's trick twice, with a bit of preprocessor added. When shown the definitions of dot and cross, you probably see how it works:

#define dot *Dot*
#define cross *Cross*

(Of course, the convention is to write preprocessor definitions in BIG_UGLY_ CAPS, but that would have "given it away" instantly.) Dot and Cross are defined as follows:

const struct Inner {} Dot;
const struct Outer {} Cross;

The idea is that a dot b is parsed into (a*Dot)*b, where you can make up your own definition of a*Dot; in this case, resulting in an intermediate result—sort of a "halfway dot product." I've thrown in some gratuitous template code because this makes it reusable for the cross product later on:

template <typename T>
struct VecOp
{
  VecOp(const Vector3& v_) : v(v_) {}
  const Vector3& v;
};
VecOp<T> operator* (	const Vector3& lhs, 
	const T& op)
{
    return VecOp<T>(lhs);
}

This intermediate result is then multiplied by b, and this multiplication is again overridden to yield the expected dot product:

double operator* (	const VecOp<Inner>& lhs, 
	const Vector3& rhs)
{
   return inner(lhs.v, rhs);
}

The version for the cross product is similar:

 VecOp<Outer> operator* (	const Vector3& lhs, 
	const Outer& rhs)
   {
        return VecOp<Outer>(lhs);
   }
   Vector3 operator* (	const VecOp<Outer>& lhs, 
	const Vector3& rhs)
   {
        return outer(lhs.v, rhs);
   }

You may be tempted to use a similar trick to give you an infix ** operator, but this has some serious drawbacks. First, observe that dereferencing a double has no meaning. This is exactly what the compiler tells you if you accidentally type:

double a = 3;
double b = 2;
double c = a ** b;

My compiler says "Illegal indirection," because a ** b is parsed as a*(*b). If you could somehow make *b return a proxy type, you would be ready. You are free to overload the "unary *" operator for your own types, but you cannot overload operator*(double) because it's a builtin type (even though this particular operator isn't provided). You therefore need one change in the spelling to make the above code work:

double a = 3;
Double b = 2;
double c = a ** b;

Note that b is now of type Double, a little wrapper around the builtin double:

struct Double
{
    Double() {}
    Double(double d_): d(d_) {}
    operator double() { return d; }
    double d;
};

so that you can define:

class Exponent
{
  friend Exponent operator*(const Double& d);
  Exponent(const Double& d_): d(d_.d) {}
public:
  const double& d;
};
Exponent operator*(const Double& d)
{
  return Exponent(d);
}
double operator*(double lhs, 
		const Exponent& rhs)
{
  return pow(lhs, rhs.d);
}

Unfortunately, this can never work for literals. This is a nuisance because this is what a "power" operator would be used most often with. With a little help from the preprocessor:

#define POW **(Double)

you can even type:

double a = 3;
double c = a POW 2;

But note that this operator still gets its precedence wrong, since you would expect that 2*a**b is parsed as 2*(a**b), yet it is instead parsed as (2*a)*(*b), or (2*a) to-the-power-of b. Therefore, it is best to just stick to using the regular pow(a, b) function. Note that for the previous vector product examples, precedence was not a problem because of the linearity of the products.

Next, I will show a trick that makes range checks look more "mathematic." Beginning C programmers often type in something like this:

    if (lower < x < upper) { ... }

instead of

    if (lower < x && x < upper) { ... }

Unfortunately, the Microsoft Visual C 6 and 7.1 compilers only emit a warning if you try the former syntax: "unsafe use of type 'bool' in operation." Even worse, many compilers (I tried GCC 2.95 and GCC 3.3.3, and even Comeau 4.3.3 with the -remarks option) don't feel there's anything wrong with it at all, yet the meaning is quite different from what you would expect. It may or may not evaluate to the correct answer: If lower < x, the first part of the expression is True, and if upper > 1, the entire expression evaluates to True as well (which may be correct). However, if lower >= x, then the first part is False; the expression becomes if (0 < upper), and if upper is then positive, the entire expression also evaluates to True (which is definitely incorrect)!

However, if you are prepared to make at least lower be of a custom type (Double, for instance), you can again apply the proxy trick:

class Comparison
{
  friend Comparison operator<(const 
        Double& lhs, const double& rhs);
  Comparison(const double& lhs_, 
             const double& rhs_) : 
             lhs(lhs_), rhs(rhs_) {}
public:
    const double& lhs;
    const double& rhs;
    operator bool() const 
		{ return lhs < rhs; }
};
Comparison operator< (const Double& lhs, 
	               const double& rhs)
{
    return Comparison(lhs.d, rhs);
}
bool operator< (const Comparison& lhs, 
               const double& rhs)
{
    return lhs && lhs.rhs < rhs;
}

This makes the former if statement work. Of course, this is dangerous code, especially if your compiler doesn't emit warnings if you accidentally leave the lower bound a builtin type.

The observation that comparison operators need not return a bool is quite interesting though, and I will conclude with a way to define new infix operators using a syntax that looks slightly more C++-ish by maximizing the use of angle brackets. First, let me set the stage. Given a simple type representing rectangles:

    typedef struct
    {
        int left;
        int top;
        int right;
        int bottom;
    } Rect;

you want to write a function that tells you whether a certain rectangle is fully contained within another.

One way of doing this would be a simple function:

bool contains(	const Rect& lhs, 
	const Rect& rhs)
{
    return	lhs.left <= rhs.left && 
	lhs.top <= rhs.top &&
	lhs.right >= rhs.right && 
	lhs.bottom >= rhs.bottom;
}

but when you see the function being used, you can't immediately guess which rectangle is supposed to be contained within which:

if (contains(a, b)) 

is at least slightly ambiguous (to the human reader, that is).

If you make it a member function of the struct, the ambiguity goes away:

if (a.contains(b)) 

but we've been taught not to add too many member functions (or member operators, for that matter) if these functions can be written as external functions just as well. It looks like this "containment" function would be an ideal candidate for a new infix operator. Let's reuse the trick this article started out with using a slightly different syntax:

const struct contains_ {} contains;

template <typename T>
struct ContainsProxy
{
	ContainsProxy(const T& t): t_(t) {}
	const T& t_;
};

template <typename T>
ContainsProxy<T> operator<(const T& lhs, 
		const contains_& rhs)
{
    return ContainsProxy<T>(lhs);
}
bool operator>(const 
	       ContainsProxy<Rect>& lhs, 
	      const Rect& rhs)
{
    return	lhs.t_.left <= rhs.left && 
	lhs.t_.top <= rhs.top && 
	lhs.t_.right >= rhs.right && 
	lhs.t_.bottom >=rhs.bottom;
}

With this bit of code, you can write

if (a <contains> b) ...

which may be surprising to a human reader, but not ambiguous. Also, given a simple Point type with x and y members, it is easily extended:

bool operator>(const ContainsProxy<Rect>& lhs,
 	 	    const Point& rhs)
{
    return	rhs.x >= lhs.t_.left && 
	rhs.x <= lhs.t_.right && 
	rhs.y >= lhs.t_.top && 
	rhs.y <= lhs.t_.bottom;
}

Finally, let's reveal why ContainsProxy was made a template. When you define

template <typename U, typename V>
bool operator>(const ContainsProxy<V>& lhs, 
		       const U& rhs)
{
    return find(	lhs.t_.begin(), 
	lhs.t_.end(), rhs) != 
	lhs.t_.end();
}

you can even do

vector<int> v;
if (v <contains> 42) ...

Of course, the opponents of operator overloading will have turned away in disgust by now, but the fact that you shouldn't do something, doesn't mean you can't.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.