Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

A Generic Heapsort Algorithm in C


AUG89: A GENERIC HEAPSORT ALGORITHM IN C

A GENERIC HEAPSORT ALGORITHM IN C

Heapsort is an excellent sorting algorithm for many applications

Stephen Russell

Stephen is a member of the Basser Department of Computer Science at the University of Sydney. He can be reached at the Madsen Building, F09, University of Sydney, N.S.W., Australia 2006.


When faced with the problem of sorting data, many programmers choose a simple algorithm, such as insertion or selection sort, or maybe Shellsort. Quicksort is another common choice, at least for languages that support recursion: The qsort library function makes this choice particularly easy for C programmers. Surprisingly few programmers choose the Heapsort algorithm, perhaps because it is not an intuitive algorithm: The other algorithms are certainly easier to understand. Heapsort, however, has some advantages over its competitors.

With Heapsort, the time taken to sort n items is proportional to n log n, regardless of the initial order of the data. In comparison, although Quicksort has an average performance of n log n, some data cause it to execute in time proportional to n2. The other algorithms also have a worst-case performance of n2. When sorting large amounts of data, the difference between a guaranteed speed of n log n versus a worst-case of n2 is significant.

In addition, Heapsort is not a recursive algorithm, making it suitable for languages--such as Basic and Fortran -- that do not support recursion. Even for languages, such as Pascal and C, eliminating the recursion may lead to a faster algorithm, depending on the cost of procedure calls. The cost of saving and restoring registers for each call may be significant on some machines.

Heapsort, however, has its disadvantages. For small sets of data, the simpler algorithms may actually perform better, due to their lower overheads. Choosing the right algorithm for a particular application requires careful analysis and knowledge of the tradeoffs involved. Books on algorithm design, such as Sedgewick's Algorithms (Addison-Wesley, 1983) and the Gonnet's Handbook of Algorithms and Data Structures (Addison-Wesley, 1984), provide useful guides to help make the right choice.

Even with these caveats, Heapsort is a strong candidate for inclusion in a programmer's library of standard tools. The rest of this article describes a generic Heapsort function, modelled on the C qsort function. This function allows an arbitrary array of data to be sorted, when provided with a function to compare elements of the array.

How Heapsort Works

The Heapsort algorithm is based on an interesting data structure -- the heap. Heaps are typically used for applications in which the maximum or minimum value is needed from a changing collection of data. When an item is added to or removed from a heap, the heap can be rearranged to find the new maximum (or minimum) value in log n steps. This process is a very efficient way to implement priority queues, and it forms the basis of Heapsort.

A heap is a partially ordered array of data, organized as a special form of binary tree, with the following properties. First, for each node in the tree, the node's value is greater than the values of either of its child nodes, which implies that the root of the tree contains the maximum value of all the nodes. If the minimum value is needed instead, each node must have a lesser value than its children. Second, the tree is fully balanced, and each level of the tree is filled from left to right. This property allows the tree to be constructed using an array, rather than explicit pointers. If the first item of a heap H is H[1], then its two children are H[2] and H[3]. Similarly, the children of H[2] are H[4] and H[5]. In general, the children of H[i] are H[2i] and H[2i+1], and the parent of H[k] is H[k/2].

Converting an array of arbitrary data to a heap involves rearranging the values in the array until the ordering property of a heap is satisfied. This conversion is usually realized by working through the tree from the bottom up. For array H containing n items, start by considering the last item that could have children -- H[k], where k is n/2. H[k] is compared to its children H[2k] and H[2k+1]. If H[k] is less than one or both of its children, it is swapped with the greater child, say H[j]. Then compare the new value in H[j] with its two children, and perform another swap if needed. Eventually, you will reach a node that has no children, or a spot where no swap is needed. At this point, the subtree starting at H[k] is organized as a heap. This process is performed for all values of k from n/2 to 1. After the last iteration, the array is a heap, and H[1] contains the maximum value. Figure 1 shows the process of constructing a heap from a collection of numbers. The tree structure of the heap is shown in Figure 2.

The Heapsort Algorithm

Heapsort works in two phases. In the first phase, the data is rearranged in the array to form a heap; this moves the maximum data value to the first position of the array. In the second phase, the maximum value is swapped with the last element in the heap, and the size of the heap is reduced by one. This phase puts the maximum data value in its final position in the array. However, the element now occupying the first position may not be the next maximum: The next maximum is found by rearranging the remaining elements of the heap. The process of finding the maximum value in the heap, and moving it to its final position, continues until the heap is reduced to a single element. Figure 3 shows Heapsort in operation for the heap constructed in Figure 1.

Listing One shows an implementation of the Heapsort algorithm for sorting an array of integer values. Most of the work is performed by the function call fixheap(h,i,n), which ensures that the subheap starting at position h[i] is correctly ordered. As I described previously, the value of i goes from n/2 to 1 while constructing the heap. In the second phase, the value of i is always 1, with the value of n decreasing as each maximum value is extracted from the heap.

The fact that arrays in C are indexed from 0, rather than 1, is adjusted for in the first statement in hsort( ) by decrementing the pointer to the base of the array. This adjustment allows the rest of the algorithm to consider the array as starting at h[1].

A Generic Heapsort

The hsort( ) function in Listing One can be easily changed to sort, for example, arrays of float or char values. With a little extra effort, hsort( ) can also sort data that are accessed indirectly through an array of pointers, or data with multiple sort keys. Having to modify the algorithm each time a particular sort is needed, though, is inconvenient and error-prone. A generic Heapsort is needed that can sort arbitrary data when provided with an appropriate function to compare values. Such a function is shown in Listing Two.

Modelled on the C qsort( ) library function, the version of hsort( ) shown in Listing Two sorts an array of n items, with each element "size" bytes in length. The cmp argument is a pointer to a comparison function. Each time fixheap( ) needs to compare elements, it calls this function, passing the addresses of the two items as arguments. The function then returns a value less than, equal to, or greater than zero, depending on whether the first item is less than, equal to, or greater than the second.

The version of hsort( ) in Listing Two is derived directly from the simple version in Listing One. Unfortunately, each calculation of the address of an element involves a multiplication by the size of the elements. On many computers, multiplication instructions are slow. It is possible, however, to rewrite the algorithm to eliminate these multiplications completely. This leads to a significant speed improvement. A further increase in speed can be gained by expanding the calls to fixheap( ) and swap( ) inline.

The final version of Heapsort is given in Listing Three. It is derived from the algorithm in Listing Two by performing induction variable elimination, as might be done by an optimizing compiler. The critical optimization is calculating the distance in bytes between the addresses of the elements h[0], h[i], and h[2i]. For a given value of i, the distance is i * size, where size is the size of each element. Table 1 shows the relationship between the heap elements and expressions for their addresses. The variable gap is the calculated distance, and base0 is the address of the element h[0].

Table 1: The relationship between the heap elements and expressions

  Element    Address
  -----------------------------------
  h[0]       base0
  h[1]       base0 + size
  h[i]       base0 + gap
  h[2i]      base0 + gap + gap
  h[2i+1]    base0 + gap + gap + size

The next step is to note that as i decreases during the heap construction phase, the value of gap is reduced by size for each iteration. The initial value for gap is n * size / 2, and it is the only step where multiplication is required. The address of h[n] is also calculated and stored in the variable hi. During the second phase of the sort, the value of hi is decreased by size for each iteration, which corresponds to decrementing n. Although these optimizations result in a more complex algorithm, they are worthwhile given its intended use as a standard library function.

Using hsort( )

Listing Four shows an example that uses hsort( ) to sort an array of strings, which are read from input. Each element of the array is of type char *. The cmp( ) function is passed two char ** pointers and uses strcmp( ) to compare the strings. After the strings are sorted, they are written to standard output.

Conclusion

The algorithms presented in this article cover a range of complexity and versatility. For some applications, the simple algorithm given in Listing One may perform best, and it provides a skeleton that can be extended for special purposes. The generic Heapsort given in Listing Three provides a useflibrary function, which can be used at any time data need sorting. Its generality, however, does incur a small cost in performance compared to a special-purpose sort. This cost will be acceptable for most applications. Choosing the right approach is, of course, part of the craft of programming.

Availability

All source code for articles in this issue is available on a single disk. To order, send $14.95 (Calif. residents add sales tax) to Dr. Dobb's Journal, 501 Galveston Dr., Redwood City, CA 94063; or call 800-356-2002 (from inside Calif.) or 800-533-4372 (from outside Calif.). Please specify the issue number and format (MS-DOS, Macintosh, Kaypro).

A GENERIC HEAPSORT ALGORITHM IN C by Stephen Russell

[LISTING ONE]

<a name="0181_000c">

/*
 *  The Heapsort to sort an array of n integers.
 */

static
fixheap(h, i, n)
int *h;
unsigned i, n;
{
   unsigned k;
   int    tmp;

   while ((k = 2 * i) <= n)      /* h[k] = left child of h[i] */
   {
      /*  Find maximum of left and right children */

      if (k != n && h[k+1] > h[k])
         ++k;         /* right child is greater */

      /* Compare greater of children to parent */

      if (h[i] >= h[k])
         return;

      /* Parent is less than child, so swap */

      tmp = h[k]; h[k] = h[i]; h[i] = tmp;

      i = k;            /* move down tree */
   }
}


hsort(h, n)
int *h;
unsigned n;
{
   unsigned i;
   int    tmp;

   --h;         /* adjust for zero-origin arrays in C */

   for (i = n/2; i > 1; --i)
      fixheap(h, i, n);   /* build heap, except for h[1] */

   while (n > 1)
   {
      fixheap(h, 1, n);   /* move max to h[1] */
      tmp = h[1];      /* move max to final position */
      h[1] = h[n];
      h[n] = tmp;
      --n;         /* reduce size of heap */
   }
}



<a name="0181_000d"><a name="0181_000d">
<a name="0181_000e">
[LISTING TWO]
<a name="0181_000e">


/*
 *   Generic Heapsort, derived from listing 1.
 */


#define   H(k)   (h + k * size)

static
swap(p1, p2, n)         /* swap n bytes */
char *p1, *p2;
unsigned n;
{
   char   tmp;

   while (n-- != 0)
   {
      tmp = *p1; *p1++ = *p2; *p2++ = tmp;
   }
}

static
fixheap(h, size, cmp, i, n)
char *h;
unsigned size, i, n;
int (*cmp)();
{
   unsigned k;

   while ((k = 2 * i) <= n)
   {
      if (k != n && (*cmp)(H(k+1), H(k)) > 0)
         ++k;

      if ((*cmp)(H(i), H(k)) >= 0)
         return;

      swap(H(i), H(k), size);
      i = k;
   }
}

hsort(h, n, size, cmp)
char *h;
unsigned n, size;
int (*cmp)();
{
   unsigned i;

   h -= size;

   for (i = n/2; i > 1; --i)
      fixheap(h, size, cmp, i, n);

   while (n > 1)
   {
      fixheap(h, size, cmp, 1, n);
      swap(H(1), H(n), size);
      --n;
   }
}


<a name="0181_000f"><a name="0181_000f">
<a name="0181_0010">
[LISTING THREE]
<a name="0181_0010">

/*
 *  Generic Heapsort.
 *
 *  Synopsis:
 *   hsort(char *base, unsigned n, unsigned size, int (*fn)())
 *  Description:
 *   Hsort sorts the array of `n' items which starts at address `base'.
 *   The size of each item is as given.  Items are compared by the function
 *   `fn', which is passed pointers to two items as arguments. The function
 *   should return < 0 if item1 < item2, == 0 if item1 == item2, and > 0
 *   if item1 > item2.
 *  Version:
 *   1988 April 28
 *  Author:
 *   Stephen Russell, Department of Computer Science,
 *   University of Sydney, 2006
 *   Australia.
 */

#ifdef INLINE

#define   swap(p1, p2, n)   {\
   register char *_p1, *_p2;\
   register unsigned _n;\
   register char _tmp;\
\
   for (_p1 = p1, _p2 = p2, _n = n; _n-- > 0; )\
   {\
      _tmp = *_p1; *_p1++ = *_p2; *_p2++ = _tmp;\
   }\
}\

#else

/*
 *   Support routine for swapping elements of the array.
 */

static
swap(p1, p2, n)
register char    *p1, *p2;
register unsigned n;
{
   register char   ctmp;

   /*
    *  On machines with no alignment restrictions for int's,
    *  the following loop may improve performance if moving lots
    *  of data. It has been commented out for portability.

    register int   itmp;

    for ( ; n > sizeof(int); n -= sizeof(int))
    {
      itmp = *(int *)p1;
      *(int *)p1 = *(int *)p2;
      p1 += sizeof(int);
      *(int *)p2 = itmp;
      p2 += sizeof(int);
   }

   */

   while (n-- != 0)
   {
      ctmp = *p1; *p1++ = *p2; *p2++ = ctmp;
   }
}

#endif

/*
 *   To avoid function calls in the inner loops, the code responsible for
 *   constructing a heap from (part of) the array has been expanded inline.
 *   It is possible to convert this common code to a function. The three
 *   parameters base0, cmp and size are invariant - only the size of the
 *   gap and the high bound of the array change. In phase 1, gap decreases
 *   while hi is fixed. In phase 2, gap == size, and hi decreases. The
 *   variables p, q, and g are only used in this common code.
 */

hsort(base, n, size, cmp)
char *base;
unsigned n;
unsigned size;
int (*cmp)();
{
   register char    *p, *q, *base0, *hi;
   register unsigned gap, g;

   if (n < 2)
      return;

   base0 = base - size;      /* set up address of h[0] */

   /*
    *  The gap is the distance, in bytes, between h[0] and h[i],
    *  for some i. It is also the distance between h[i] and h[2*i];
    *  that is, the distance between a node and its left child.
    *  The initial node of interest is h[n/2] (the rightmost
    *  interior node), so gap is set accordingly. The following is
    *  the only multiplication needed.
    */

   gap = (n >> 1) * size;       /* initial gap is n/2*size */
   hi = base0 + gap + gap;    /* calculate address of h[n] */
   if (n & 1)
      hi += size;      /* watch out for odd arrays */

   /*
    *  Phase 1: Construct heap from random data.
    *
    *  For i = n/2 downto 2, ensure h[i] is greater than its
    *  children h[2*i] and h[2*i+1]. By decreasing 'gap' at each
    *  iteration, we move down the heap towards h[2]. The final step
    *  of making h[1] the maximum value is done in the next phase.
    */

   for ( ; gap != size; gap -= size)
   {
      /*  fixheap(base0, size, cmp, gap, hi) */

      for (p = base0 + (g = gap); (q = p + g) <= hi; p = q)
      {
         g += g;      /* double gap for next level */

         /*
          *  Find greater of left and right children.
          */

         if (q != hi && (*cmp)(q + size, q) > 0)
         {
            q += size;   /* choose right child */
            g += size;   /* follow right subtree */
         }

         /*
          *  Compare with parent.
          */

         if ((*cmp)(p, q) >= 0)
            break;      /* order is correct */

         swap(p, q, size);   /* swap parent and child */
      }
   }

   /*
    *  Phase 2: Each iteration makes the first item in the
    *  array the maximum, then swaps it with the last item, which
    *  is its correct position. The size of the heap is decreased
    *  each iteration. The gap is always "size", as we are interested
    *  in the heap starting at h[1].
    */

   for ( ; hi != base; hi -= size)
   {
      /* fixheap(base0, size, cmp, gap (== size), hi) */

      p = base;      /* == base0 + size */
      for (g = size; (q = p + g) <= hi; p = q)
      {
         g += g;
         if (q != hi && (*cmp)(q + size, q) > 0)
         {
            q += size;
            g += size;
         }

         if ((*cmp)(p, q) >= 0)
            break;

         swap(p, q, size);
      }

      swap(base, hi, size);      /* move largest item to end */
   }
}





<a name="0181_0011"><a name="0181_0011">
<a name="0181_0012">
[LISTING FOUR]
<a name="0181_0012">


/*
 *   Use hsort() to sort an array of strings read from input.
 */

#include   <stdio.h>


#define   MAXN   500
#define   MAXSTR   1000


cmp(p1, p2)
char **p1, **p2;
{
   return strcmp(*p1, *p2);
}

static   char   *string[MAXN];
static   char   buf[MAXSTR];

extern   char   *gets();
extern   char   *malloc();


main()
{
   char   *p;
   int   i, n;

   for (n = 0; gets(buf); ++n)
   {
      if (n == MAXN)
      {
         fprintf(stderr, "Too many strings\n");
         exit(1);
      }

      p = malloc(strlen(buf) + 1);
      if (p == (char *)NULL)
      {
         fprintf(stderr, "Out of memory\n");
         exit(2);
      }

      strcpy(string[n] = p, buf);
   }

   hsort(string, n, sizeof string[0], cmp);

   for (i = 0; i < n; ++i)
      puts(string[i]);

   exit(0);
}











Copyright © 1989, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.