The Difference between Equality and Identity Operators (in D)
D provides two ways of comparing objects: the equality operator and identity operator (actually we should say operators – plural, since each comes with its negated counterpart). Testing objects for equality is done with the familiar (at least to C/C++ and C# programmers) “==” and “!=” operators. Testing for identity is the realm of the “is” and “!is” operators.
As you may suspect already, equality and identity are not the same.
To better understand the difference, let’s consider an example:
import std.stdio;
class Pet
{
string name;
public override bool opEquals(Object other)
{
return name == (cast(Pet)other).name;
}
}
void main()
{
Pet
dog = new Pet();
Pet
cat = new Pet();
writefln(dog is cat);
writefln(dog == cat);
}
Puzzling enough, this code prints:
false
true
Naturally, a dog is not a cat and there is no surprise that the result of the “dog is cat” expression is false. But why is the dog equal to a cat (at least according to our code)? The short answer: identity tests if the object references are the same; equality results in a call to operator opEquals. In the code above, the name (the string member datum of the Pet class) is null in both the cat and the dog instances and opEquals correctly sees them as equal.
Like in C# (and unlike C++) D class objects are really references under the wraps.
Occasionally, D programmers may need to compare such references (very much like
they would compare pointers in C, without comparing the entities that are
pointed to). In the “dog is cat”
expression above, dog and cat refer to different class objects. So
they are not identical. That explains
the need for the “is” operator (aka
identity operator). In Java, identity versus equality is an either-or deal, as
suggested by the online docs at http://java.sun.com/docs/books/tutorial/java/IandI/objectclass.html:
"The equals() method provided in the Object class uses
the identity operator (==) to determine whether two objects are equal
[...]To test whether two objects are equal in the sense of equivalency (containing the same
information), you must override the equals() method."
The == operator is
simply a shorthand for invoking opEquals;
the line writefln(dog
== cat), although not identical with writefln(dog.opEquals(cat)), generates the same syntax tree. D source code
looks more intuitive and natural when “==” is used instead of the verbose opEquals function call.
Please note that because all classes in D
inherit off a root Object class, which provides a base implementation for opEquals, every class will always have
opEquals defined (whether it does what the programmer wants or it should be
overridden is a different question).
In addition to class objects (i.e. instances of
classes) D has other data type families: scalars (such as integers and floating
point numbers), structs and arrays. For
scalars, equality and identity are the same, as demonstrated by this program:
int i =
2;
int j =
1;
writefln(i is ++j); // prints true
writefln(i == j); // also prints true
For struct objects, identity is defined as the bits in the struct being identical. The same as equality then (you may conclude); well, not quite. Structs in D have value type semantics. If in the example above we re-declare Pet as a struct rather than a class, like this:
struct Pet
{
string name;
}
The equality and identity tests will yield the
same result (that is, the dog and cat will bizzarrely appear as both equal and identical, because their
respective name fields are
null). The compiler generates code to
compare the bits in the structs: the null name fields are compared bit-by-bit
as being equal.
Structs however are more interesting than that:
they have value type semantics, like scalars, but are a bit like classes too,
in the sense that some operators can be re-defined by the programmers. Note
that we did not say “overridden”, to
avoid confusion with the polymorphic behavior of classes. Some text books on
object-oriented languages call this “operator overloading”.
Classes have a slot in their virtual tables for
opEquals: when the programmers writes
her own, the base class operator is overriden.
Structs do not have virtual tables, and hence no polymorphic behavior is
possible. But programmers can re-define the behavior of certain operators, such
as opEqual. The compiler will
statically determine if such an operator should be called in lieu of the
default behavior.
For example, we can decide (in a
counter-Orwellian spur) that no pets are equal, ever:
struct Pet
{
string name;
public bool opEquals(Pet other)
{
return false;
}
}
Note that the argument type is Pet (and not
Object, as in the case of the overridden class operator); there is no overridden attribute, either.
The code
writefln(dog is cat);
writefln(dog == cat);
now prints:
true
false
This is because the identity operation for
structs always results in a bit-by-bit comparison, and cannot be overridden, or re-defined, by the user; the equality
operation has been re-defined to always return false. And by the way, the
identity operator cannot be overriden for classes either: for class instances,
the identity operator always
compares the two object references.
Now that we have seen how identity and equality
operators work for structs, lets see how they apply to arrays. Lets assume we have two arrays of Pet
objects:
Pet[] myPets;
Pet[] herPets;
// … populate the arrays…
if (myPets ==
herPets) { // calls opEquals for each
object
// …
}
if (myPets is
herPets) { // tests if object references
are the same
// …
}
We can say that testing arrays for equality is
a deep operation (opEquals is called
on each pair of array elements), whereas an identity test simply checks that 1)
the number of elements in the two arrays is the same, and 2) that the object
references are the same; it is a shallow
operation.
In conclusion, equality and identity operators
work the same for scalars, but not for classes, structs, and arrays. The
identity operator cannot be overloaded. In the cases where we need to know if
two object references (or the references to objects in two arrays) are the
same, the identity operator should be used. The identity operator is faster
than the equality operator because it does not do a “deep” comparison of the
two objects.
There is a difference between the signature of
opEquals for class objects versus struct objects. For structs, we have the
straightforward:
public bool opEquals(StructName other);
For class objects we have two choices: declare
opEquals as above:
public bool opEquals(ClassName other);
or (the
more verbose):
public override bool opEquals(Object other);
The difference is in how
the operator== is invoked. If it needs to be used in a polymorphic fashion (say
you have an array of Objects) then the signature must be:
public override bool opEquals(Object);
For use cases when the final type of the class objects being compared is known,
the first form may be used. The first form hides
the base class' opEquals rather than overriding
it. There's no need to worry about incorrect uses though: the compiler will
catch them for you and either issue a compile-time error (if the "-w"
flag is given to the command line) or insert code that detects the error at
runtime (and throws an error):
version (D_NET)
{
import System;
alias Console.WriteLine print;
}
else
{
import std.stdio;
alias writefln print;
}
class Test
{
int i;
public bool opEquals(Test t)
{
print("opEquals");
return i == t.i;
}
this(int j)
{
i = j;
}
}
void main()
{
//this causes either a compile-time error or a runtime
exception
//core.exception.HiddenFuncError: Hidden method called for test.Test
/*
Object t1 = new Test(1);
Object t2 = new Test(1);
*/
//this works (non-polymorphic behavior):
Test t1 = new Test(1);
Test t2 = new Test(1);
try
{
print (t1 == t2);
}
catch (Exception e)
{
print (e);
}
One last note: unlike in C#, where operators ==
and != can be overloaded separately, in D you only need to worry about the == (opEquals) operator: when the compiler
sees != it simply calls opEquals,
then logically negates the result.
Frankly, I never understood why C++ and C# allow the equals operator and the negated counterpart to be overloaded separately, only for text books to recommend that you should always overload them together: it seems like a lot of unncessary work that the designers of D avoided altogether.
Thanks to Walter Bright, Bartosz Milewski and Andrei Alexandrescu for reviewing this article.

