Channels ▼
RSS

C/C++

Proposing a C++ String Class Standard

Source Code Accompanies This Article. Download It Now.


OCT91: PROPOSING A C++ STRING CLASS STANDARD

Steve is a member of the Zortech development team specializing in class libraries (C++ Tools, the Database, and the IOStreams implementation). He also teaches C++ language training courses in the UK and can be reached at 201-691-8203.


As C++ progresses through the ANSI standardization process, the standard is likely to define library elements (much as the C Standard) which come as component parts of any conforming C++ implementation. The library elements specified will differ from the C standard in that they come in the form of class libraries. Some libraries, such as iostreams, are fairly well defined by existing implementations. Others are less well defined.

This article describes one such library: the String class. The String class is a good illustration of what C++ library components might be like. But more importantly, it could provoke feedback from potential users of standard C++ as to what facilities should be provided in a string package. In the case of String, such input can still influence the standard. In fact, if you'd like to comment on the package presented here, contact me at the number listed at the bottom of this page; I will collate the responses and forward them to the ANSI committee.

A Class Specification

The backbone of the specification for a C++ class library consists of header files that describe the public interface of the classes. That is, the properties of the new data types which the classes describe.

The draft ANSI specification for the String class is similar to that in Listing One (page 114). I say similar because the current ANSI draft does not include the member functions shown in Figure1(a). Nevertheless, they are included in Listing One. This is where the feedback begins! Two assignments have been added to match the argument types of the constructors. String manipulation has always been a strong point of Basic, and this area has been addressed by including the insert and remove functions. The NOT operator has been added to provide a succinct test for empty Strings. If the operator char*() function is made to return a null pointer in the case of an empty string, then both notations shown in Figure 1(b) can be used. The operator[]( ) function is discussed later.

Figure 1: (a) Functions added to the ANSI draft; (b) notations to test for an empty String; (c) explicit friend functions; (d) avoiding constructor calls in Boolean operations.

  (a)

  String &operator=(char);
  String &operator+=(char);

  String &insert (int, const String&);
  String &insert (int, const char *);
  String &insert (int, char);
  String &remove (int, int);

  int operator!( );
  char operator [] (int);

  (b)

  String a;

  If (!a) ...;      // if a is empty string
  If (a) ...;    // if a is not empty - uses operator char *()

  (c)

  friend int operator==(const String&, const String&);
  friend int operator==(const String&, const char *);
  friend int operator==(const char *, const String&);

  (d)

  String a;
  ...
  if (a == "whatever")
  ...;

  will cause a sequence something like:

  String temp = "whatever";    // constructor from const char *
  if (operator==(a, temp))
      ...;
  // temp destructor call

Finally, many C programmers are comfortable with the ANSI C string functions, so it seems courteous to include analogues of these in a new string package for a related language. Not doing so is much like throwing the baby out with the bathwater -- it wastes existing expertise.

The use of explicit friend functions for the Boolean tests and concatenation operations -- see Figure 1(c) -- is in line with the ANSI draft. It is possible, however, to omit the last two functions shown in Figure 1(c) because the constructor String::String(const char *) can automatically convert these argument patterns; see Figure 1(d). On the other hand, the explicit versions are more efficient because they avoid the overhead of constructor calls.

A public class interface specification says a lot about a new type, but not everything that a programmer might need to know. Concerning implementation, the ANSI draft says that no terminator character is reserved for String. It does not, however, state whether String objects with the same value (that is, the same number and sequence of characters) may actually occupy the same storage. It would be more memory efficient, and in some cases faster, if such common representations were shared. This has been the practice in many implementations of string classes, and one which will be followed here.

Taking such considerations into account, the actual String class header file expands beyond the public interface to include something like that shown in Listing Two (page 114). The extra class srep (string representation) is entirely private except for its friendship with the String class. It provides for the shared element of String objects having the same value. The private constructor effectively prohibits derivation, so the dirty trick of the nominal-sized body can be used and the new operator overloaded; thus srep objects of exactly the right size are always allocated. Each srep object keeps a count of the number of String objects currently sharing it and notes the length of its contents, thereby substantially speeding up many string operations.

The private part of the String class is just a pointer to an srep object. The private function body provides access to the actual storage, and the static character variable (only one of these for all instances of String in an application) provides a safe address for uses of the array access operator which happen to be out of bounds.

Of course, the stated public interface doesn't have to be implemented this way. It may not even be necessary to state how it is done. For the most part though, programmers are a suspicious breed; if an implementation is not clearly explained, they tend to go away and do it again for themselves. We don't want this to happen. It breaks the first law of object-oriented programming, which says, "never reinvent the wheel."

Usage

Now that we have a specification for the class, our outline "documentation" should come up with some examples of how the public interface is intended to be used. Taking things in the order of the specification, we can start with the constructors illustrated in Figure 2(a). The constructor from a regular C-style string has a defaulted second argument. This enables the use of the function call-style initialization with two arguments, as shown in Figure 2(b). Note that one of the constructors is also called if a function, foo, that takes a String object as its argument is called. Consider, for example, the statement foo("argument"). Argument passing is initialization, so a constructor must be called to initialize the argument object. In this case, it is the one which converts from a regular C-style string. A constructor is also called if foo is called with an actual String argument d, as in foo(d). The argument is passed by value, so the copy constructor gets called to set up the auto variable which corresponds to the argument within the scope of foo. Because of this (although a String object is small, in fact just a pointer to an srep object), functions like foo are often written to take a const reference argument, foo (const String &s). Then the call foo(d) doesn't involve a constructor call, and the use of const prevents code in the body of foo from messing around with the value of d, which is the usual objection to reference arguments. But that's enough digression.

Figure 2: (a) Initialization of String objects; (b) using the function call style of initialization; (c) possible assignment operations for the String class.

  (a)

  String t;             // an empty string
  String a = "good girl";      // initialized from a regular C style string
  String b = a;         // the copy constructor
  String c = '.';

This is the idiomatic usage. It would have just the same effect to write:

  String a ("good girl");
  String b(a);
  String c('.');

  (b)

  String d("good girl", 4);
  // same effect as String d = "good";

  (c)

  String v = "abcd";
  v += 1;         // result "bcde" ?

  String w = "provided"
  w  -= "vide"          // result "prod" ?
  w -=  'r';            // result "pod" ?

  String x = "1234567890"
  x <<= 1;              // result "2345678901"

The constructors also get called, of course, if a String object is created dynamically. The destructor gets called automatically when any auto or static instances of String go out of Scope, or when we explicitly delete dynamically allocated ones.

Initialization and Assignment

Classes like String need to make separate and explicit provisions for initialization and for assignment. Assignment operators effect the value of an existing object. It seems logical that the set of possible assignments should match the set of possible initializations.

Both C and C++ have many assignment operators. Apart from straight assignment, only one of them is supported for this String class: the += operation, which appends a string (with a small s, that is either a String, a string, or a character) to an existing String. Others are conceivable, but not necessarily as intuitive. Figure 2(c) shows additional possible assignment operators. The most useful of these would probably be - =.

Modifiers

The next group of member functions are "other modifiers." These act on the String object for which they are called. It is fairly obvious that these, and the assignment operations discussed previously, must have the effect of severing the object from any representation to which it is attached in common with other String objects.

The modifiers provide for insertion of strings into existing Strings and removal of specified portions of Strings, including truncation.

A class such as String usually needs to make some provision for automatic conversion. At present, the ANSI draft provides the two conversions shown in Figure 3(a). Both of these are problematic. The conversion to char causes problems when the String is empty, as shown in Figure 3(b). The function calls in this example are all right because the compiler can apply the user-defined conversion from String to char to coerce the actual arguments to the show function to match the formal argument. However, String has no reserved delimiter character, so it is not possible to return a character from the conversion from an empty String which distinguishes it from a string of nulls.

Figure 3: (a) Automatic conversions currently provided for in the ANSI draft; (b) the conversion to char causes problems when the String is empty.

  (a)

  operator char();     // conversion to a char
  operator char *();   // conversion to a char *

  (b)

  void show(char c)
  {
      cout << c << ' ' << int (c) << endl;
  }
  char array[4] = { '\0', '\0', '\0', '\0', };

  String nulls(array,4);
  String empty;

  show(nulls);
  show(empty);

Also, a conversion to char is painfully close to a conversion to int. Any function which took an integer argument would be satisfied by the erroneous use of a String variable as its argument. The value passed would correspond to its first character. Such bugs could be difficult to find!

operator char *( ) is problematic because the obvious thing to do is to return a pointer to the sequence of characters associated with the String. Unfortunately, this is not a pointer to a regular C-style string. C++ Strings don't have a reserved terminator character. The issue could be fudged by always storing one more characters in the String than necessary (a terminating 0) but this would still not get around the bugs caused by strings which genuinely had embedded null characters. Note that this has bearing on the provision of functions for Strings which parallel the ANSI C functions for strings [strlen(const char *), and strlen(const String&)].

It might be preferable to take a step back and have the String class provide only one conversion: one to void *. This would allow users to get a char pointer to the first character of the sequence by explicitly recognizing (with a cast) what they were doing. It is also a fairly standard C++ subterfuge to allow an if statement to test an object.

Mutators

The next group of members are mutators: functions which act rather like copy constructors in that they make new, but in these cases, altered String objects from existing ones.

The mutators provided by the ANSI draft, including those provided by friend functions as opposed to members, are shown in Figure 4(a). The first two are straightforward: Given that String a = "Cat", and that b = a.upper(), b gets "CAT" a is unchanged. Also, if c = b.lower(), then c gets "cat" b is unchanged. The third mutator in Figure 4(a) extracts a substring and uses it to form the new String. The first argument indicates the substring offset, and the second the substring length.

Figure 4: (a) The mutators provided by the ANSI draft; (b) providing access to individual characters of a String.

  (a)

  String String::upper() const;
  String String::lower() const;
  String String::operator() (int start, int length) const;
  String operator+(const String &a, const String &b);
  String operator+(const String &a, const char *b);
  String operator+(const char *a, const String &b);

  (b)

  String a = "Cat";
  a(0) = 'R'; // a gets "Rat"
  // a.operator()(0) = 'R';

The selection of constructors for class String suggests that the addition operator, which forms a new String from two other objects should also be overloaded to deal with single characters. Then, writing b = '(' + v + ')' where v is a String would not cause implicit constructor calls to convert the characters to Strings before addition. This already does not happen with b = " (" + v + ")". Similar considerations apply to the Boolean operations as well.

Accessing Individual Chars

The next vexed design decision is how to give access to the individual characters of a String. The ANSI draft provides char &operator( )(int). This allows for expressions such as that shown in Figure 4(b). The draft also provides operator char *( ) for strings, so it is possible to write a[0] = 'R' as well.

This actually seems a more "natural" notation for a String, which is an array-like sequence of characters. If the conversion to char pointer were dropped, it might be desirable to provide operator[](int) to serve as an lvalue. operator( )(int) could be retained to provide a faster unchecked report of the value of the ith character. This is the approach adopted in the example implementation.

Anyone who started off thinking that the String class was a trivial design exercise should be having second thoughts! "Simple" low-level classes like String are always a design minefield.

Searching

The next bunch of functions -- shown in Figure 5(a) -- does search operations on Strings. The match functions return the index of the first nonmatching character, or -1 if the match is exact. The index functions return the index of the starting character of a matching substring or character, or -1 if there is no match. The index functions may optionally take an offset argument to determine where the search should start. The remaining member functions test for an empty String and report the length of a String.

Figure 5: (a) Searching strings; (b) Boolean tests on Strings.

  (a)

  String a = "The quick brown fox jumped over the lazy dogs back";
  String b = "The queen";
  String c = "lazy";

  a.match(b);          // returns 7
  a.match("The quick");        // returns -1

  a.index(b);         // returns -1
  a.index(c);         // returns 36
  a.index("The", 1);      // returns -1
  a.index("The");         // returns 0
  a.index("the");         // returns 32
  a.index('b');           // returns 10
  a.index('b',11);        // returns 46

  (b)

  // C style
      char *s1 = "Joe", *s2 = "Fred";
      if (s1 > s2) ...;        // legal, but not usually what was intended!
      if (!strcmp(s1,s2) ...;      // proper test for equality

  // C++ style

       String s1 = "Joe", s2 = "Fred";
       if (s1 > s2) ...;           // evaluates to true
       if (s1 == s2) ...;          // proper test for equality

Boolean Tests

All that remain are the Boolean friend functions and the iostreams I/O operations. The Boolean operators are quite intuitive, and in fact allow the sort of operations on Strings which beginners in C often attempt on regular C-style strings, as shown in Figure 5(b). As noted, the triplets of Boolean functions could arguably be expanded to include int operator == (const String&, char), and so on.

Conclusion

An implementation of the String class library is available along with an accompanying test suite. Space limitations prohibit a full presentation, but the library is available electronically (see "Availability" on page 3) or directly from me.

Finally, I encourage you to make comments on the String class library. I will collect the responses I receive and forward them to the ANSI committee, as promised.


_PROPOSING A C++ STRING CLASS_
by Steve Teale


[LISTING ONE]

<a name="022f_0013">


class String {
public:
// String class constants
    enum string_enum { all = INT_MAX };

// Constructors / destructor
    String();                // make an empty String
    String(const String&);       // make a String by copying another
    String(const char *, int count = 0);
                                 // make a String from a regular C style string
    String(char);            // make a String from a single char

    ~String();             // Clean up after a String goes out of scope

// Assignments
    String &operator=(const String&);   // Assign from another String
    String &operator=(const char *);    // Assign from a C style string
    String &operator=(char);        // Assign from a single char
    String &operator+=(const String&);
                                  // Concatenate a String to an existing String
    String &operator+=(const char *);
                                    // Concatenate a C style string to a String
    String &operator+=(char);       // Concatenate a char

// Other modifiers
    String &insert(int pos, const String&);  // Insert into an existing string
    String &insert(int pos, const char *);
    String &insert(int pos, char);
    String &remove(int pos, int count); // Remove from an existing string
    String &truncate(int);      // Truncate a String

// Conversions
    operator char*() const;
                     // Translate a String to a pointer to its first character
    operator char() const;     // Translate a String to its initial character

// Mutators (make an existing String into a variant)
    String upper() const;      // Make a new String by conversion to uppercase
    String lower() const;
    String operator()(int start, int len);
                   // Make a new String starting from start, of len characters

// Access individual characters
    char &operator[](int);    // Access individual characters as if the String
                  // were an array
    char operator()(int);     // Get the character value of the i'th character.

// Searches
    int match(const String&) const;
                             // Return position of first non-matching character
    int match(const char *) const;
    int index(const String&, int pos = 0) const;
    int index(const char *, int = 0) const;
                             // Find the position of a substring or character
    int index(char, int = 0) const;

// Statistics
    int operator!() const;  // Test for an empty string
    int length() const;     // Get the length of the String

// Some of the properties of a String are provided by functions which
// are friends of the String class

friend int strlen(const String&);   // Parallel the ANSI C string functions
// etc

// Concatenations to make a new String
friend String operator+(const String&, const String&);
friend String operator+(const String&, const char *);
friend String operator+(const char *, const String&);

// Boolean tests
friend int operator==(const String&, const String&);
friend int operator==(const String&, const char *);
friend int operator==(const char *, const String&);
friend int operator!=(const String&, const String&);
friend int operator!=(const String&, const char *);
friend int operator!=(const char *, const String&);
friend int operator>(const String&, const String&);
friend int operator>(const String&, const char *);
friend int operator>(const char *, const String&);
friend int operator<(const String&, const String&);
friend int operator<(const String&, const char *);
friend int operator<(const char *, const String&);
friend int operator>=(const String&, const String&);
friend int operator>=(const String&, const char *);
friend int operator>=(const char *, const String&);
friend int operator<=(const String&, const String&);
friend int operator<=(const String&, const char *);
friend int operator<=(const char *, const String&);

// Input/ output operations
friend istream &operator>>(istream&, String&);
friend ostream &operator<<(ostream&, const String&);
};

<a name="022f_0014">
<a name="022f_0015">

[LISTING TWO]

<a name="022f_0015">

#ifndef __STRING_HPP
#define __STRING_HPP
#include <limits.h>

class istream;   // forward declarations to avoid including all of iostream.hpp
class ostream;

class srep {        // actual string representation
friend class String;
private:
    srep(int, const char * = 0);
    void *operator new(size_t cs, size_t ss = 0);

    int refs;
    int length;
    char body[1];
};

class String {
public:
    // as above
private:
    char *body() const { return rp? rp->body: 0; }

    srep *rp;
    static char dummy;
};

inline int strlen(const String &s)
{
    return s.length();
}

// and other friend functions which can conveniently be defined inline

#endif
</PRE>
<P>
<B>Figure 1</B>
<P>

<PRE>
(a)

    String &operator=(char);
    String &operator+=(char);

    String &insert(int, const String&);
    String &insert(int, const char *);
    String &insert(int, char);
    String &remove(int, int);

    int operator!();


(b)

String a;

if (!a) ...;        // if a is empty string
if (a) ...;     // if a is not empty - uses operator char *()


(c)

friend int operator==(const String&, const String&);
friend int operator==(const String&, const char *);
friend int operator==(const char *, const String&);

Figure 2

(a)

    String a;
    ...
    if (a == "whatever")
        ...;

will cause a sequence something like:

    String temp = "whatever";   // constructor from const char *
    if (operator==(a, temp))
        ...;
    // temp destructor call


(b)

// two members
    int operator==(const String&);
    int operator==(const char *);

// and a friend
friend int operator==(const char *, const String&);

Figure 3


(a)

String t;           // an empty string
String a  = "good girl";    // initialized from a regular C style string
String b = a;       // the copy constructor
String c = '.';

This is the idiomatic usage.  It would have just the same effect to write:

String a("good girl");
String b(a);
String c('.');


(b)


String d("good girl", 4);
// same effect as String d = "good";


(c)


String v = "abcd";
v += 1;         // result "bcde" ?

String w = "provided"
w  -= "vide"        // result "prod" ?
w -= 'r';           // result "pod"   ?

String x = "1234567890"
x <<= 1;            // result "2345678901"

Figure 4

(a)


    operator char();    // conversion to a char
    operator char *();  // conversion to a char *

(b)

    void show(char c)
    {
        cout << c << ' ' << int(c) << endl;
    }

    char array[4] = { '\0', '\0', '\0', '\0' };

    String nulls(array,4);
    String empty;

    show(nulls);
    show(empty);

Figure 5

(a)

    String String::upper() const;
    String String::lower() const;
    String String::operator()(int start, int length) const;
    String operator+(const String &a, const String &b);
    String operator+(const String &a, const char *b);
    String operator+(const char *a, const String &b);


(b)


    String a = "Cat";
    a(0) = 'R';         // a gets "Rat"
    // a.operator()(0) = 'R';

Figure 6

(a)

String a = "The quick brown fox jumped over the lazy dogs back";
String b = "The queen";
String c = "lazy";

a.match(b);         // returns 7
a.match("The quick");       // returns -1

a.index(b);         // returns -1
a.index(c);         // returns 36
a.index("The", 1);      // returns -1
a.index("The");         // returns 0
a.index("the");         // returns 32
a.index('b');           // returns 10
a.index('b',11);            // returns 46


(b)


// C style
    char *s1 = "Joe", *s2 = "Fred";
    if (s1 > s2) ...;           // legal, but not usually what was intended!
    if (!strcmp(s1,s2)) ...;        // proper test for equality

// C++ style

    String s1 = "Joe", s2 = "Fred";
    if (s1 > s2) ...;           // evaluates to true
    if (s1 == s2) ...;          // proper test for equality

Figure 6

(a)

String a = "The quick brown fox jumped over the lazy dogs back";
String b = "The queen";
String c = "lazy";

a.match(b);         // returns 7
a.match("The quick");       // returns -1

a.index(b);         // returns -1
a.index(c);         // returns 36
a.index("The", 1);      // returns -1
a.index("The");         // returns 0
a.index("the");         // returns 32
a.index('b');           // returns 10
a.index('b',11);            // returns 46


(b)


// C style
    char *s1 = "Joe", *s2 = "Fred";
    if (s1 > s2) ...;           // legal, but not usually what was intended!
    if (!strcmp(s1,s2)) ...;        // proper test for equality

// C++ style

    String s1 = "Joe", s2 = "Fred";
    if (s1 > s2) ...;           // evaluates to true
    if (s1 == s2) ...;          // proper test for equality


Copyright © 1991, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video