Channels ▼
RSS

C/C++

C++ String Performance

Source Code Accompanies This Article. Download It Now.


It's easy to imagine the reaction of seasoned C++ programmers upon seeing yet another string class implementation: "So why do we need another string class? We had too many in the pre-standard C++ Library world, and now that we finally have std::string, why not just use it?"

I agree, but only if the standard implementation doesn't introduce significant performance problems. In this article, I show how the standard string implementation isn't always good enough, particularly for multithreaded applications that create lots of temporary strings. That said, it is worth noting that there's no such thing as a standard string implementation—they all differ. (For examples of the different implementations, see item 15 in Scott Meyers's Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library, Addison-Wesley, 2001.) Here, I analyze one of the more popular string implementations—the one that comes with g++ 2.96 on Red Hat Linux 7.x. To this end, I create a number of multithreaded tests showing the Standard C++ string implementation (which comes with g++ 2.96 on Red Hat Linux 7.3) can have real performance problems for multithreaded applications that need to create lots of unrelated temporary string objects. To address these performance problems, I present a string class that gives you the speed and efficiency of automatic C strings, plus the convenience of C++ string objects that are STL friendly.

std::string Class Design

All string implementations—including g++ 2.96—usually share the same design principles:

  • Memory for the string value is always allocated on the heap. (Okay, Scott Meyers describes one implementation that doesn't allocate memory for the very short strings.) Think about it. You have a string object on the stack, but there is always a heap allocation for the value of the string unless it is initialized through a copy constructor and it's reference counted (which generates other problems).
  • String values are reference counted; so if you assign string s1 to s2, their internal pointers point to the same location in memory. It's apparently done to make copy constructors and assignment operators very fast. They don't need to allocate memory and copy the value, just set the pointer and increment a reference counter. The same logic makes a destructor very fast. In most cases, it just decrements the reference counter.

  • Closely related to reference counting is copy-on-write (COW) optimization that allocates/copies when one of the string objects pointing to the same value changes.

The first item is usually a bad idea when you have lots of temporary strings as automatic variables in a multithreaded application. The heap is shared between threads, so some locking is inevitable; otherwise, the heap's internal data is corrupted. This approach defeats the idea of the fast non-blocking creation/access of the stack variables in multithreaded applications (every thread has its own nonshared stack). It looks deceptive to programmers, too. For instance, in Example 1, mystring is an automatic (stack) variable with no indication that memory for "Some C value" is allocated on the heap (unfortunately, that is exactly what happens).


int myfunc()
{
  std::string mystring ( "Some C value" );
  ...
}

Example 1: This defeats the notion of fast, nonblocking creation/access of stack variables in multithreaded applications.

Likewise, the second and third items may sound good until you realize they're a bad fit for multithreaded applications. Value and reference counters are shared between multiple string variables and potentially between multiple threads that absolutely require access serialization (locking) to the reference counter. Atomic increment/decrement/check can be used if these operations are supported by the CPU's command set; see "Optimizations that Aren't (In a Multithreaded World)," by Herb Sutter, C/C++ Users Journal, June 1999.

Listing One (teststring1.cpp) creates 100,000 temporary std::string objects on the stack (most of the memory is allocated on the heap). It has only one thread running, but was built as a multithreaded application, so it does all the required locking (refer to the Makefile, available at the top of this article). I ran this and subsequent tests on a 2-GHz Dell desktop with 1 GB of RAM under Red Hat Linux 7.3. Example 2(a) shows the results.

Listing One

#include 
#include "util.h"
unsigned int createStrings ( int n )
{
 static char *samples[]={"something1","something2","something3","something4"};
 printf ("Thread <%d>: Creating/destroying %d strings...\n",pthread_self(),n);
 unsigned int start = utilGetTickCount ();
 for ( int i=0; i: total time=%d millisecs, average=%f\n",
           pthread_self(), end-start, (float)(end-start)/n );
 return ( end - start );
}
int main ( int argc, char **argv )
{
  if ( argc > 2 )
  {
    printf ( "teststring1  \n" );
    return 0;
  }
  int n = 100000;
  if ( argc > 1 )
    n = atoi ( argv[1] );
  createStrings ( n );
  exit(0);
}



(a)
Thread <1024>: Creating/destroying 100000 strings...
Thread <1024>: total time=108 millisecs, average=0.001080

(b)
Thread <1026>: Creating/destroying 100000 strings...
Thread <2051>: Creating/destroying 100000 strings...
Thread <3076>: Creating/destroying 100000 strings...
Thread <4101>: Creating/destroying 100000 strings...
Thread <5126>: Creating/destroying 100000 strings...
Thread <6151>: Creating/destroying 100000 strings...
Thread <7176>: Creating/destroying 100000 strings...
Thread <8201>: Creating/destroying 100000 strings...
Thread <9226>: Creating/destroying 100000 strings...
Thread <10251>: Creating/destroying 100000 strings...
Thread <1026>: total time=7804 millisecs, average=0.078040
Thread <2051>: total time=8749 millisecs, average=0.087490
Thread <10251>: total time=10455 millisecs, average=0.104550
Thread <6151>: total time=10643 millisecs, average=0.106430
Thread <8201>: total time=10718 millisecs, average=0.107180
Thread <9226>: total time=10724 millisecs, average=0.107240
Thread <7176>: total time=10755 millisecs, average=0.107550
Thread <5126>: total time=10760 millisecs, average=0.107600
Thread <4101>: total time=10765 millisecs, average=0.107650
Thread <3076>: total time=10867 millisecs, average=0.108670


Example 2: (a) The results of running teststring1.cpp; (b) results of running teststring2.cpp.

Listing Two (teststring2.cpp) is the same test, but with 10 threads running—each creating/destroying 100,000 automatic string objects. (Again, every one of them allocates memory on the heap and employs some locking and copying.) Example 2(b) presents the less-than-encouraging results of the teststring2.cpp test. With just 10 threads, average automatic string object handling (creation/destruction) increased approximately 100 times (10000/108). Since these results seem unacceptable for high-load multithreaded applications, I decided to find out where time is being spent, then create a new string class that doesn't have those deficiencies. Possible performance culprits include locking introduced by the reference counting optimization, heap allocations (and it's locking) and copying of the value.

Listing Two

#include 
#include "util.h"
void *createStrings ( void *n_void )
{
 int n = (int) n_void;
 static char *samples[]={"something1","something2","something3","something4"};
 printf ("Thread <%d>: Creating/destroying %d strings...\n",pthread_self(),n);
 unsigned int start = utilGetTickCount ();
 for ( int i=0; i: total time=%d millisecs, average=%f\n", 
           pthread_self(), end-start, (float)(end-start)/n );
 return NULL;
}
int main ( int argc, char **argv )
{
  if ( argc > 3 )
  {
    printf ( 
       "teststring1 [] []\n" );
    return 0;
  }
  int n = 100000;
  int n_thr = 10;
  if ( argc > 1 )
    n = atoi ( argv[1] );
  if ( argc > 2 )
    n_thr = atoi ( argv[2] );
  pthread_attr_t attr;
  pthread_attr_init ( &attr );
  pthread_t *tid = new pthread_t[n_thr];
  int i;
  // create threads
  for  ( i=0; i\n", i );
      exit(-1);
    }
  }
  // wait for all threads to terminate
  for  ( i=0; i

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video