Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Parallel Counting Sort

Algorithm Improvement through Performance Measurement: Part 1

Algorithm Improvement through Performance Measurement: Part 2

Algorithm Improvement through Performance Measurement: Part 3

Algorithm Improvement through Performance Measurement: Part 4

Algorithm Improvement through Performance Measurement: Part 5

Algorithm Improvement through Performance Measurement: Part 6

To date, the focus for this series has been on measurement-based performance optimization of various sorting algorithms, such as Counting Sort, several Radix Sorts, Insertion and Selections Sorts, on a single processor. These implementations, while high-performance through careful use of the hybrid algorithm and data type adaptable approach, use only a portion of the available computational resources. On a quad-core hyper-threaded processor such as the Intel i7 860, these single-threaded algorithms utilize somewhere between 12% and 25% of the processor, leaving 75% to 88% of the compute power idle. As the number of cores increases, following Moore's self-fulfilling prophecy, a higher percentage of the CPU will be unutilized by single-threaded algorithm implementations. In this article and ones to come, the previously developed high-performance serial sorting algorithms will be refactored into parallel algorithms to raise the performance even higher by utilizing multicore computational resources.


Several companies have worked tirelessly to create tools to enable parallel programming. Intel has a vested interest in enabling multicore programming, due to silicon shrinking continuing its path while frequency and power density growth is unable to scale at the same rate. Intel, for instance, has developed Threading Building Blocks and Integrated Performance Primitives (IPP) libraries, along with a powerful suite of compiler, multicore/multi-thread development and performance analysis tools. The Intel C++ compiler supports the latest version of the OpenMP standard, which is one of the parallel programming methods. Intel has also acquired Cilk Arts and the Cilk++ parallel compiler, and is developing Ct Technology.

nVidia has developed CUDA programming environment to enable programs to runs on hundreds of cores in graphics processors (GPUs) across many GPUs, enabling supercomputer level of performance. OpenCL is an industry effort at portable parallel programming across CPUs and graphics processors (GPUs). Sieve C++ Parallel programming system offers a portable environment for many CPUs and game systems. Microsoft Visual Studio 2010 has a suite of tools to enable visualization and debug of parallel applications, as well a Parallel Pattern Library (PPL). These are wonderful developments at simplifying the laborious and complex task of parallel programming. I'll explore some of these exciting tools in this series by applying them to the problem of transforming sorting algorithms from serial to parallel.

The first tool to be explored is Intel's Threading Building Blocks (TBB), a thread-pool management library along with sophisticated templates for C++ parallel programming development. With many parallel constructs and a work-stealing thread scheduler, TBB enables the programmer to focus on the parallel problem instead of the mechanics of thread management. TBB works on several operating systems and has support for a variety of parallel programming patterns. Let's dive right in and apply TBB to the fastest sorting algorithm developed so far -- the Counting Sort.

Counting Sort

Counting Sort is a very efficient, in-place, high-performance sorting algorithm that is more than 20X faster than STL sort(). It handles arrays of 8-bit and 16-bit signed and unsigned numbers very well [2]. For arrays of numbers it does not move the array elements, but instead counts the occurrence of each possible value, and then constructs the sorted array from these counts. Counting Sort is a linear-time O(n) algorithm, which performs 2n amount of work -- n for reading/counting, n for writing/constructing.

Listing 1 is an implementation of the Counting Sort algorithm for sorting an array of 8-bit unsigned numbers.

// Copyright(c), Victor J. Duvanenko, 2010

inline void CountingSort( unsigned char* a, unsigned long a_size )
	if ( a_size <  2 )	return;

	const unsigned long numberOfCounts = 256;

	// one count for each possible value of an 8-bit element (0-255)
	unsigned long count[ numberOfCounts ] = { 0	};		// count array is initialized to zero by the compiler

	// Scan the array and count the number of times each value appears
	for( unsigned long i = 0; i < a_size; i++ )
		count[ a[ i ] ]++;

	// Fill the array with the number of 0's that were counted, followed by the number of 1's, and then 2's and so on
	unsigned long n = 0;
	for( unsigned long i = 0; i < numberOfCounts; i++ )
		for( unsigned long j = 0; j < count[ i ]; j++ )
			a[ n++ ] = (unsigned char)i;

Listing 1

The algorithm creates a count array of 256 elements (32-bits each) -- one for each possible value for an 8-bit number -- and initializes the count array to zero. The first for loop counts the occurrence of each 8-bit value by scanning the array from beginning to end. The second for loop creates the sorted array by storing/writing the counted number of each element. For a detailed explanation of the Count Sort algorithm for arrays of unsigned and signed numbers see Part 6 of this series.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.