It's often nice to have storage management that's a bit smarter than you get with auto_ptr but rather less complex than automatic garbage collection. Here's one style of smart pointers that strikes such a careful balance.
September 01, 1998
URL:http://www.drdobbs.com/extending-the-reference-counting-pattern/184403543
Talks and heated discussions about conventional C/C++ pointers go far beyond the C/C++ programming community. Some people love them, some hate them. This situation is understandable, as pointers are difficult creatures to handle. Too many things can go wrong:
The situation is aggravated by a false sense that the pointer itself is the object. However, it is a separate object that stores the memory address of an object (or maybe not). Marginally different syntax adds to the confusion:
The above problems lead to memory corruption and memory leaks:
Object* p1; Object* p2 = new Object; Object* p3 = new Object; p3 = p2; //Original p3 has been lost delete p1; // Memory corruption. delete p2; // p3 is invalid now. delete p3; // Memory corruption
The STL template class auto_ptr, originally proposed by Greg Colvin, helps to address several of these problems. The template class provides a safer alternative to conventional pointers. Code using auto_ptr is often cleaner, shorter, and less prone to the memory management problems shown above:
auto_ptr<Object> p1; auto_ptr<Object> p2 = new Object; auto_ptr<Object> p3 = new Object; p3 = p2; // Original p3 has been // deleted. // p3 owns the object // pointed to by p2. // data will be automatically // deleted when auto_ptrs // go out of visibility scope.
auto_ptr introduces the notion of exclusive data ownership. Only one auto_ptr instance owns the data even if there are several instances pointing at the same object. The auto_ptr instance that owns the data deletes the data when it gets reassigned, or goes out of visibility scope.
Unfortunately, the exclusive ownership concept of the auto_ptr class differs in a subtle way from the shared ownership concept of conventional pointers. The code fragment shown in Figure 1 illustrates a case where the pointers work as expected but the auto_ptr objects do not.
The problem arises because the ownership of the object is first transferred to a copy of obj created on the stack and then to the object tmp within the function foo. When tmp goes out of scope it deletes the object! Therefore, when we return to func from foo, obj no longer owns the controlled object. Worse, it does not even point at valid data. The use of template class auto_ptr creates a false sense of security that it will handle memory-related problems for us. But in order to be used safely, the auto_ptr concept has to be modified to support the notion of shared ownership, as conventional pointers do.
The technique for implementing shared ownership here is called the Reference Counting Pattern. The idea behind the technique is to create "smart" pointers that keep track of all instances referring to (owning) an object. No single instance of a smart pointer is solely responsible for object management. You might say that the object manages itself. When no references to the object are left, among all the smart pointers that can refer to it, it will be destroyed automatically.
There are several benefits to this approach:
Greg Colvin submitted his version of smart pointers (template class counted_ptr) to the C++ Standards Committee along with auto_ptr. But counted_pointer was eventually removed from the draft C++ Standard. So I present here my version of smart pointers, using the reference-counting concept. It differs from the usual way of implementing the pattern (see [1-3]) in several ways:
Listings 1 and 2 show my implementation of smart pointers using the reference-counting pattern. It uses a two-tiered design composed of template classes Handle and Counted. This approach decouples two fundamentally different tasks and allows them to be implemented separately and cleanly. Reference counting is managed by Counted and lightweight presentation and data sharing are managed by Handle.
The specialization Handle<T> requires that the class T define a default constructor T::T() and a copy constructor T::T(const T&). I consider this not a restriction but good programming style part of Coplien's requirements for the "canonical" class implementation (see [5]). The internal structure is a little unconventional in that an object of class T is itself embedded in class Counted<T> instead of a pointer to the instance:
class Counted {..... T _instance; uint _num_references; }; class ConventionalCounted {..... T* _instance; uint _num_references; };
This approach has the following advantages:
1. Only one block of memory is allocated to hold a T instance together with a reference counter, instead of two separate blocks for a T instance and a ConventionalCounted instance. The conventional approach incurs at least double the memory overhead of the Counted solution.
2. Counted does not introduce an additional level of indirection as the ConventionalCounted does, so it is at least marginally quicker.
3. A Handle<Data> instance is a natural pointer to a Data instance. Therefore, using a Handle<Data> object instead of Data* pointer is safe in mixed or legacy environments, and in situations such as:
extern void foo(int, ...); Data* d = new Data; Handle<Data> h; foo(5, o); foo(5, h); // No Handle-to-Data // conversion invoked
4. A Data instance is created and destroyed along with each Handle<Data> instance. The existence of a Handle<Data> object thus guarantees the existence of a Data object. The ConventionalCounted approach inherits the usual uncertainty that comes with conventional pointers. It might represent no data at all. For example:
Data* p; auto_ptr<Data> a; Handle<Data> h; p->data_method(); // Disaster a->data_method(); // Disaster h->data_method(); // OK
For the sake of safe and reliable Data resource management, the Handle<Data> object is solely responsible for the creation, accessing, and deletion of a Data object. Such a strong association gives the (accurate) impression that a Handle<Data> object is a Data object, eliminating the need to manipulate and address Data objects directly.
One drawback to this approach, however, is that the internal Data objects are being created with a default constructor only. If a Data object requires special initialization, it has to be done as follows:
Data* p = new Data(name, value); auto_ptr<Data> a = new Data(name, value); Handle<Data> h = Data(name, value);
or
Handle<Data> h; h.initialize(name, value);
From my experience, this limitation is rarely an issue. The overwhelming majority of classes enjoy the benefits of safe and reliable management that the Handle class provides. There are methods of providing customized data initialization (other than default constructors) but such a discussion is beyond the scope of this article.
The reference-counting mechanism is encapsulated within template class Counted, so the Handle<T> class interface is simple and easy to understand. However, I should still say a few words about the two constructors:
Handle(const T&); Handle(const T*);
Their existence is very unfortunate as they are T-to-Handle converters and copy an externally provided T instance to an internally stored T instance. Ideally they are not needed, since T instances are created within Handle<T>. But in real life, where T instances may be created explicitly, bypassing the Handle<T> mechanism, the constructors have to be used as a bridge. For example, third-party libraries that we are not able to modify may use T directly instead of Handle<T>. Quite a few old C library functions return pointers to statically allocated internal storage. The constructors take care of such unfortunately designed cases:
extern T* lib_func(); T* t = lib_func(); // Unusable without // immediate copying Handle<T> h = lib_func(); // OK to use
From my experience, the use of the constructors is limited to such areas where new meets old. Such situations can be very well encapsulated and localized.
Being a smart pointer, Handle<Data> is transparent in its interactions with Data. It works very much the same way conventional pointers do, thanks to a family of converters and dereference operators:
operator T& (); operator T* (); operator const T& () const; operator const T* () const; const T* operator-> () const; T* operator-> ();
The operators properly extend the Handle<Data> transparency into the area of const objects. A const Data* pointer accepts no changes to the data it points at. const Handle<Data> behaves the same. Unfortunately, by comparison, auto_ptr uses a single dereference operator:
T* operator->() const;
Therefore, it does not propagate its constness onto the data it represents and cannot represent const data at all:
class Data { ... void change_data(); }; const Data* d = new Data; const Handle<Data> h; const auto_ptr<Data> p(d); // Does not compile const auto_ptr<Data> p(new Data); d->change_data(); // Correctly does not compile h->change_data(); // Correctly does not compile p->change_data(); // Wrongly compiles
The converters always return pointers or references to an internal instance of a Data class. Even non-const converters do that. Here they differ from Kenneth Ngai's implementation published previously in CUJ [2] where non-const converters duplicate the data. In order to coexist with conventional pointers, a Handle instance has to behave like conventional pointers when used in their place. If a non-const pointer to an object is passed to a function, it is expected that the function can change the content of the object. That is the way Handle works when used in the pointer-oriented environment (functions handling Data* directly). It simulates a Data object being passed by reference:
void change0(Data* data) { data->change(); // changes the Data instance. } void change1(Handle<Data> data) { data->change(); // changes the Data instance } void function() { Data* d = new Data; Handle<Data> h; change0(d); // changes 'd' change0(h); // changes 'h' change1(h); // changes 'h' }
Unfortunately, the converters violate the encapsulation of an internal Data instance within Handle<Data> and expose the instance to the cruelty of the world outside Handle's care. The whole discussion above about the Handle(const Data*) and Handle(const Data&) constructors applies as well to the converters. They serve as the second unfortunate but necessary part of the bridge to legacy libraries that deal with Data* pointers directly:
extern void old_func1(Data*); extern void old_func1(const Data*); Handle<Data> h; old_func1(h); // OK old_func2(h); // OK
The converters remain largely invisible to the programmer. Sadly, their existence itself is misleading especially if someone has not grown accustomed to having a Handle<Data> representing a Data instance and tries to access the instance directly:
void unfortunate(Handle<T> h) { extern void do_something(const T*); T* p = h; // Completely unnecessary, // but guaranteed by converters. do_something(p); // Unnecessary. // do_something(h); // Much better and safer as p does // not hang around // and "Handle-to-T" conversion is // implicit. delete p; // Trouble. }
Therefore, converters are there, but please do not use them. Luckily, developers of a T class can prepare their class for the integrated use within Handle<T> and prevent such an unfortunate usage by prohibiting public creation and deletion of T objects and letting Handle<T> manage resources instead:
class Foo { ... friend class Handle<Foo>; private: ~Foo (); Foo (); };
or
class Foo { ... private: operator new (); // Not implemented. operator delete (); // ditto. };
Handle<T> keeps the tradition of mimicking conventional pointers in assignment operators. The pointers just forget the data they were pointing at before assignment and certainly do not modify the data. Handle<T> does the same. Additionally, if there are no Handle instances left referencing the data, the data will be deleted. This is often a quick, safe, and clean memory-management solution that does not need special garbage-collection mechanisms:
void func() { Data* p0 = new Data; Data* p1 = new Data; Handle<Data> h0; Handle<Data> h1; p1 = p0; // Original p1 data lost unmodified. h1 = h0; // Original h1 data deleted unmodified. }
There are no comparison operators. Implicit conversions are applied, so, comparisons work as expected:
Handle<Data> h0; Handle<Data> h1 = h0; Handle<Data> h2 = h0.duplicate(); Handle<Data> h3; if (h0 == h1) ... // true if (h0 == h2) ... // false if (h0 == h3) ... // false
There is no implementation of the address-of operator:
Handle<T>* operator& ();
to prevent its being used. The program should have no access to the address of a Handle<T> instance, since such access bypasses the reference-counting mechanism. Handle<T> instances are passed by value (after all, they are only the size of a single pointer). Well, to be more precise, Handle<T> can be passed by reference. But doing so requires a lot of care because it disables the reference-counting mechanism. Consequently, it is theoretically quicker (++_num_references and --_num_references do not cost much). Therefore, it is not advisable to try to gain the questionable advantage of passing Handle<T> instances by-reference. However, such an operation is needed occasionally:
void by_value (Handle<Data> h) { Handle<Data> brand_new; // h is a stack-based copy // The copy(!) is reassigned // The original is unchanged h = brand_new; } void by_reference (Handle<Data>& h) { Handle<Data> brand_new; // h is the original from the 'function' // The original is changed. h = brand_new; } void function() { Handle<Data> h; by_value(h); // Does not change h by_reference(h); // Changes h }
A convenient extension to the standard reference counting has been implemented in the area of error processing. Conventional pointers have a nice feature they have a built-in singular or error value (0, the null pointer). Every time something goes wrong, 0 or an error pointer value can be returned instead of a pointer to data. I find this feature extremely convenient for handling "soft" errors, as opposed to "hard" errors that are best handled by the exception mechanism. Therefore, I have introduced an _error variable within the Handle<T> class (at the expense of the _num_references variable), and a HandleError class for a friendlier error-processing interface.
Two levels of error processing are available, error indication and error reporting. With the former, we do not bother specifying what happened and just indicate an error:
Handle<T> do_something() { ... if (bad) return Handle<T>::error(); ... } void func() { Handle<T> h = do_something(); if (h.is_error()) ... // do_something() failed }
With the latter, we can return a sophisticated report of what actually went wrong:
HandleError bad_value; HandleError bad_format; Handle<T> do_something() { ... if (bad-value) return Handle<T>.error(bad_value); if (bad-format) return Handle<T>.error(bad_format); ... } void func() { Handle<T> h = do_something(); if (h.is_error()) ... // An error { if (h.get_error() == bad_value) ... // Process bad-value error. if (h.get_error() == bad_format) ... // Process bad-format error. }
or:
if (h.get_error() == no_error) ... // Everything is OK else if (h.get_error() == bad_value) ... // Process bad-value error else if (h.get_error() == bad_format) ... // Process bad-format error }
The area where the Handle<T> class might be used is potentially very large. Obvious examples are situations where functions have to return something more complex than just an integer, like dynamically allocated strings, structures, etc. An example of a simple String class is shown in Listing 3 together with the way of handling various error conditions using Handle.
[1] Scott Meyers. More effective C++ (Addison-Wesley, 1996).
[2] Kenneth Ngai. "A Template for Reference Counting," CUJ, August 1997.
[3] R. G. White. "Advantages and Disadvantages of Unique Representation Patterns," C++ Report, September 1996.
[4] Jonathan S. Shapiro. C++ Toolkit (Prentice Hall, 1991).
[5] James O. Coplien. Advanced C++ (Addison-Wesley, 1992).
[6] Vladimir Batov. "A Quick And Simple Memory Allocator," CUJ, January 1998.
Vladimir Batov is a senior software engineer working for Raytheon Systems Company (Marlborough, MA) and specializing in Air Traffic Control software development. He has 15 years of experience in software development using C/C++. He can be reached at [email protected].
void foo(Object* obj) { Object* tmp = obj; tmp->something(); } void func() { Object* obj = new Object; foo(obj); obj->something(); } void foo(auto_ptr<Object> obj) { auto_ptr<Object> tmp = obj; tmp->something(); // tmp owns the Object data // and deletes them when goes // out of visibility scope. } void func() { auto_ptr<Object> obj = new Object; foo(obj); // Object data has been deleted // within foo(). // obj points nowhere! obj->something(); // A problem. } //End of File
// Copyright (c) 1996-1997 Vladimir Batov // // Permission to use, copy, modify, distribute and sell this // software and its documentation for any purpose is hereby // granted without fee, provided that the above copyright notice // appear in all copies and that both that copyright notice and // this permission notice appear in supporting documentation. // I make no representations about the suitability of this software // for any purpose. It is provided "as is" without express or // implied warranty. // Handle<T> requires T class to have T::T() and // T::T(const T&) constructors. #ifndef BATOV_HANDLE_H #define BATOV_HANDLE_H typedef unsigned int uint; class HandleError { public: ~HandleError () {} HandleError () : _error(_allocate()) {} HandleError (uint err) : _error(err) {} operator uint () const { return _error; } bool operator== (const HandleError& e) const { return _error == e._error; } static HandleError no_error; static HandleError unknown; private: uint _error; uint _allocate(); }; template<class T/*, class Allocator = MemoryAllocator When supported*/> class Handle { public: typedef HandleError Error; private: class Counted { public: ~Counted () {} Counted (); Counted (const T&); void dismiss () { if (!--_num_references) delete this; } void use () { ++_num_references; } operator T& () { return _instance; } operator const T& () const { return _instance; } operator T* () { return &_instance; } operator const T* () const { return &_instance; } T* operator-> () { return &_instance; } const T* operator-> () const { return &_instance; } // Possible memory optimization. See [6]. // void* operator new (size_t) // { return _allocator.allocate(); } // void operator delete (void* storage) // { _allocator.deallocate(storage); } bool is_shared () const { return 1 < _num_references ? true : false; } bool is_error () const { return _error ? true : false; } Error get_error () const { return _error; } void set_error (Error); private: T _instance; uint _num_references : 16; // Reference counter. // (<=65535 looks enough) uint _error : 16; // Error conditions flag. // (<=65535) // Possible memory optimization. See [6]. // static MemoryAllocator _allocator; }; public: ~Handle (); Handle (); Handle (const T&); Handle (const T*); Handle (const Handle<T>&); operator T& () { return _counted->operator T&(); } operator T* () { return _counted->operator T*(); } operator const T& () const { return _counted->operator T&(); } operator const T* () const { return _counted->operator T*(); } const T* operator-> () const { return _counted->operator ->(); } T* operator-> () { return _counted->operator ->(); } Handle<T>& operator = (const T&); Handle<T>& operator = (const T*); Handle<T>& operator = (const Handle<T>&); static Handle<T> error (Error =Error::unknown); void set_error (Error =Error::unknown); Error get_error () const { return _counted->get_error(); } bool is_error () const { return _counted-> is_error(); } bool is_shared () const { return _counted->is_shared(); } Handle<T> duplicate () const; // Convenience method to create // a copy of a "T-Handle<T>" pair. private: Counted* _counted; // User has NO ACCESS TO THE ADDRESS OF A HANDLE instance // as such access bypasses reference counter mechanism. // User is forced to pass it by value (Handle) that is // supervised by the mechanism. Handle<T>* operator& (); // Not defined to prevent usage. }; template<class T> inline Handle<T>::Counted::Counted() : _num_references(0), _error(Error::no_error), _instance() { // Preferred form of T instance creation. } template<class T> inline Handle<T>::Counted::Counted(const T& from) : _num_references(0), _error(Error::no_error), _instance(from) { // Should be avoided as an internal copy of 'from' // is created. } template<class T> inline void Handle<T>::Counted::set_error(Error err) { _error = err < Error::unknown ? err : Error::unknown; } template<class T> inline Handle<T> Handle<T>::error(Error num) { Handle<T> h; h.set_error(num); return h; } template<class T> inline void Handle<T>::set_error(Error err) { _counted->set_error(err); } template<class T> inline Handle<T>::~Handle() { _counted->dismiss(); } template<class T> inline Handle<T>::Handle() : _counted(new Counted()) { _counted->use(); } template<class T> inline Handle<T>::Handle(const T& copy) : _counted(new Counted(copy)) { _counted->use(); } template<class T> inline Handle<T>::Handle(const T* copy) : _counted(copy ? new Counted(*copy) : new Counted()) { _counted->use(); } template<class T> inline Handle<T>::Handle(const Handle<T>& ref) : _counted(ref._counted) { _counted->use(); } template<class T> inline Handle<T>& Handle<T>::operator=(const T& src) //8. { if (*this != &src) { _counted->dismiss(); _counted = new Counted(src); _counted->use(); } return *this; } template<class T> inline Handle<T>& Handle<T>::operator=(const T* src) //8. { if (*this != src) { _counted->dismiss(); _counted = src ? new Counted(*src) : new Counted(); _counted->use(); } return *this; } template<class T> inline Handle<T>& Handle<T>::operator=(const Handle<T>& src) { if (this->_counted != src._counted) { _counted->dismiss(); _counted = src._counted; _counted->use(); } return *this; } template<class T> inline Handle<T> Handle<T>::duplicate() const { return Handle<T>(operator->()); } #endif // BATOV_HANDLE_H // End of File
#include "include/handle.h" HandleError HandleError::no_error(0); // max of 16 bits HandleError HandleError:: unknown(65535); uint HandleError::_allocate() { static uint next = 0; return ++next; } //End of File
#include <stdio.h> #include <string.h> #include <iostream.h> #include <vbtools/handle.h> class _String { public: typedef int Size; typedef char* Address; ~_String () { delete _text; } _String () : _text(0), _size(0) {} _String (Address, Size =0); Address text () { return _text; } Size size () { return _size; } void text (Address, Size); // private: Address _text; Size _size; }; _String::_String(Address addr, Size sz) : _text(0), _size(0) { text(addr, sz ? sz : addr ? strlen(addr) : 0); } void _String::text(Address addr, Size size) { delete _text; _text = (Address) memcpy(new char[_size = size], addr, size); } // String instances are of 'void*' size. Easy to pass around. // _String resources are handled automatically. typedef Handle<_String> String; HandleError string_is_empty; HandleError string_too_long; String foo(String input) { String s; if (!input->size()) { String err = String::error(string_is_empty); // A value can be set to error instance as well. err->text("Usage: foo argument.\n"); return err; } if (4 < input->size()) return s.error(string_too_long); s->text("Changed"); return s; } int main(int argc, char** argv) { String input(1 < argc ? argv[1] : (char*) 0); String orig = input; String output = foo(input); if (output.is_error()) { if (output.get_error() == string_is_empty) { // A value and an error condition can be // passed at the same time. cout << output->text() << endl; } else if (output.get_error() == string_too_long) { cout << "Argument is too long.\n"; } } else { cout << orig-> text() << endl; // argv[1] printed. cout << input->text() << endl; // "Changed" printed. } } //End of File
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.