Dr. Dobb's | Restricted Pointers are Coming

Restricted Pointers are Coming

C9X introduces the new type qualifier restrict. Why it's there and where it pays off takes a bit of explaining.

July 01, 1999
URL:http://www.drdobbs.com/restricted-pointers-are-coming/184403676

July 1999/Restricted Pointers are Coming/Figure 1

Figure 1: What possible compiler optimizations might look like if done by hand

#define N 1000
     
/* Kernel with hoisting of invariant done by hand. */
void 
with_hoisting( float * restrict a, float * restrict b, 
    int n, int j ) {

    float t0 = (b[j+N/4] + b[j-N/4]) * 0.5f; 
    int i; 
    for( i=0; i<n; ++i ) 
        a[i] = t0;
}
     
/* 
 * Kernel with software pipelining done by hand.
 * The optimal pipelining depends upon the target machine. 
 * The example here is only one such way.  
 * It's mighty peculiar to do it in this example 
 * since b[..] can be hoisted, 
 * but nonetheless at least one compiler did something 
 * similar to this. 
 */
void 
with_software_pipelining( float * restrict a, float * restrict b, 
    int n, int j ) {

    int i; 
    float t0, t1, t2, t3;
    if( 3 <= n ) {
        /* prologue for pipelined loop */
        t0 = b[j+(N/4)];        /* Part of iteration i=0 */ 
        t1 = b[j-(N/4)];        /* "    "  "         i=0 */
     
        t2 = t0 + t1;           /* "    "  "         i=0 */ 
        t0 = b[j+(N/4)];        /* "    "  "         i=1 */ 
        t1 = b[j-(N/4)];        /* "    "  "         i=1 */
     
        t3 = 0.5f * t2;         /* Part of iteration i=0 */ 
        t2 = t0 + t1;           /* "    "  "         i=1 */ 
        t0 = b[j+(N/4)];        /* "    "  "         i=2 */ 
        t1 = b[j-(N/4)];        /* "    "  "         i=2 */
     
        /* The pipelined loop */
        for( i=3; i<n; ++i ) {
            /* Next five statements could be evaluated 
               in single step. */ 
            a[i-3] = t3;
            t3 = 0.5f * t2;
            t2 = t0 + t1;
            t0 = b[j+(N/4)];
            t1 = b[j-(N/4)];
        }
     
        /* epilogue for pipelined loop */
        a[n-3] = t3;            /* Part of iteration i=n-3 */ 
        t3 = 0.5f * t2;         /* "    "  "         i=n-2 */ 
        t2 = t0 + t1;           /* "    "  "         i=n-1 */
     
        a[n-2] = t3;            /* "    "  "         i=n-2 */ 
        t3 = 0.5f * t2;         /* "    "  "         i=n-1 */
     
        a[n-1] = t3;            /* "    "  "         i=n-1 */
    } else {
        // Not enough iterations to pipeline the loop
        for( i=0; i<n; ++i ) 
            a[i] = (b[j+N/4] + b[j-N/4]) * 0.5f;
    }   
}

July 1999/Restricted Pointers are Coming/Listing 1

Listing 1: A test for compiler support of restrict

/* 
 * This program is legal C and C++, except for the use of 
 * restrict. The abuses of restrict are deliberately illegal.
 * Some compilers, such as the DEC Alpha cc compiler, recognize
 *  __restrict instead of restrict.  Others such as KAI C++ 
 * require the option --restrict. 
 */
#include <stdio.h>
     
#define N 1000
     
#ifdef USE_UNDERSCORE
#define restrict __restrict 
#endif /* USE_UNDERSCORE */
     
/* BODY is the computation shared by all tests here. */
#define BODY int i; for( i=0; i<n; ++i ) \
    a[i] = (b[j+N/4] + b[j-N/4]) * 0.5f;
     
float * wash( float * ptr );    /* forward declaration */
     
/* Base line version without restrict */
void simple( float * a, float * b, int n, int j ) {
    BODY
}
     
/* 
 * Version with restrict on both formal parameters. 
 * If the compiler complains about the declaration, 
 * it probably does not understand restrict. 
 */
void 
with_restrict_formals( float * restrict a, float * restrict b, 
    int n, int j) {

    BODY
}
     
/* Version with restrict only on output pointer */
void 
with_restrict_out_formal( float * restrict a, float * b, 
    int n, int j ) {

    BODY
}
     
/* Version with restrict only on input pointer */
void 
with_restrict_in_formal( float * a, float * restrict b, 
    int n, int j ) {

    BODY
}
     
/* Version with restrict on pointers local to a block */
void 
with_restrict_local( float * ap, float * bp, int n, int j ) {

    float * restrict a = wash(ap);
    float * restrict b = wash(bp);
    BODY
}
     
/* Version with restrict on pointers, but work is done with 
   local copies of the pointers */
void 
with_local_copy_of_restrict_formals( float * restrict ap, 
    float * restrict bp, int n, int j ) {

    float * a = ap+n-N;
    float * b = bp+n-N;
    BODY
}
     
typedef 
void (*a_ptr_to_function)( float * a, float * b, int n, int j );
     
#define ENTRY(x) {#x,x}
     
struct {
    char * name;
    a_ptr_to_function ptr_to_function;
} table[] = {
    ENTRY(simple),
    ENTRY(with_restrict_formals),
    ENTRY(with_restrict_out_formal),
    ENTRY(with_restrict_in_formal),
    ENTRY(with_restrict_local),
    ENTRY(with_local_copy_of_restrict_formals)
};
     
#define M (sizeof(table)/sizeof(table)[0])
     
/* 
 * Function report_differences compares results a and b. 
 *
 * A report of a few differences (2-4) probably indicates that 
 * the compiler exploited restrict for software pipelining, 
 * but not for invariant hoisting. 
 *
 * A report of about 3*N/4 differences probably indicates that 
 * the compiler exploited restrict for hoisting of invariants. 
 */
void 
report_differences( float * a, float * b, const char * name ) {
    int i;
    int current = -1;
    printf("%s:",name);
    for( i=0; i<=N; ++i ) 
        if( i==N || a[i]==b[i] ) {
            if( current>=0 ) {
                if( current!=i-1 ) printf("...%d", i-1 ); 
                current = -1;
            }
        } else {
            if( current<0 ) {
                printf(" %d", i );
                current = i;
            }   
        }
    printf("\n");
}
     
float array[M][N];
float* array_ptr[M];
     
/* 
 * The purpose of function "wash" is to return a pointer the 
 * same as "ptr", but in a way the disguises that the returned 
 * pointer really is the same. 
 */
float * wash( float * ptr ) {
    int j;
    for( j=0; ptr!=array[j]; ++j ) 
        continue;
    return array_ptr[j];
}
     
int main() {
    int i, j;
     
    /* Initialize data sets */
    for( j=0; j<M; ++j ) {
        array_ptr[j] = array[j];
        for( i=0; i<N; ++i ) {
            array[j][i] = i;
        }
    }
     
    /* Run each test */
    for( j=0; j<M; ++j ) {
        (*table[j].ptr_to_function)(array[j], array[j], N, N/2);
    }
     
    /* Report the errors - more is better! */ 
    for( j=1; j<M; ++j ) {
        report_differences( array[0], array[j], table[j].name );
    }
     
    return 0;
}

July 1999/Restricted Pointers are Coming

Restricted Pointers are Coming

Arch D. Robison

C9X introduces the new type qualifier restrict. Why it's there and where it pays off takes a bit of explaining.

You often hear the statement, "FORTRAN is faster than C." Alas, this is even true sometimes. I'll explain why, and how the new C Standard (C9X) introduces the keyword restrict to help narrow the performance gap. The keyword is already implemented in some C and C++ compilers. I'll summarize the motivation for restrict, how to use it, and where to set your expectations for compilers. I'll also show a way to check how well your compiler exploits the possibilities of restrict.

FORTRAN Envy

An lvalue in C/C++ is an expression that designates an object. For instance, ((*p)[i]->foo) is an lvalue. It has more lvalues as subexpressions: p, *p, and (*p)[i]. Two lvalues that refer to the same object or function are said to alias one another. C and C++ permit forms of aliasing that can hamper compiler optimizations.

FORTRAN has a rule that within a subroutine no two identifiers may be aliases unless they appear in an EQUIVALENCE declaration. This restriction gives a compiler substantial liberty to restructure a program to fit a target machine's capabilities. The language definition lets the compiler do so despite the fact that many FORTRAN programmers unknowingly violate the rule.

In the past, the big gains from restructuring were mostly limited to specialized vector hardware, but microprocessors have now advanced to the point where they benefit too. For example, modern compilers perform an optimization called software pipelining, which restructures loops in a way to run multiple iterations simultaneously. Software pipelining is frequently prohibited when the compiler is in doubt about aliasing. Though C's low-level features often allow optimization by hand, software pipelining by hand tends to be impractical because it depends upon obscure details of each particular target machine, and doing it wrong can hurt performance.

Ironically, the change in microprocessors has occurred just when seeking ultimate performance has become a secondary issue. People say that performance is important, but let's face it: delivering slow software on schedule will always take precedence over delivering fast software late. Consequently, the aliasing problems of C (and C++) have taken a back seat to language features that battle complexity. But the issues are still dear to a few, including myself.

Vendors tried to solve the problem with pragmas. As usual, pragmas increased misery by not being portable. Worse yet, the pragmas failed to address the full richness of C's type system, such as structure fields that are pointers.

A qualifier, noalias, was proposed for C89. It was rejected as unsound, and came too late in the standards process to be fixed. The fatal flaw was that noalias was a property of objects, not pointers to them. The Numerical C Extensions Group (NCEG) invented the new qualifier restrict with quite different semantics, and proposed it to both the C++ and C9X committees. The C++ committee rejected it, largely from the perception that it was a fringe issue peculiar to numerical programming on unusual hardware [1]. My belief is that this impression arises mistakenly from the association with FORTRAN (compounded by this article!) For example, the Standard Template Library (STL) has loops over sequences that could benefit from restrict, and there are important optimizations peculiar to objects that require good alias information.

There was also the well founded fear that C++ and C would end up disagreeing on what restrict means, because details of the C9X version were changing. The committee for C9X did adopt restrict. It is already implemented in some compilers, though as I'll soon explain, "implement" is inherently fuzzy. Despite the precise definition of the semantics of restrict, exploiting those semantics fully can be quite difficult.

Restrict Qualifies Pointers

The keyword restrict is a qualifier, like const or volatile. Here's an example declaration:

void *
memcpy( void * restrict p, const void * restrict q, size_t n );

Note that it qualifies the type of the pointer, not the type of the pointer's target. Non-pointer types may not be qualified with restrict. And as with other type qualifiers, it is meaningless on rvalues. The C9X draft [3] gives the language lawyer's meaning of restrict. For practical purposes, declaring a pointer as restrict says that in its scope its target will not be accessed through any pointer that was not copied from the restricted pointer. Furthermore, the rules dictate that two restricted pointers cannot be used to access the same object while they are both within scope. Thus for the memcpy example, the restrict qualifiers say that the source and destination objects cannot overlap.

Here is an example contrived to show various uses of restrict, albeit uglier than typical usage:

void
foo( T * restrict r0, T * p1, T * p2, int j ) {
    for( int k=0; k<1000; ++k ) {
        T * restrict r3 = p1;
        *r0++ = *p2++;
        p1[k] = ++r3[k+j];
    }
}

The compiler can presume that all pairs of lvalues in the routine, except the pair *p2 and p1[k], do not overlap. Notice that if j==0, then p1[k] and r3[k+j] are aliased, and the programmer is guilty of an error that may cause unexpected results.

Restrict requires careful attention to scope. In the example, the scope of r3 is for a single iteration, hence the programmer is asserting that p1[k] and r3[k+j] are disjoint for the same iteration only. Consider moving the declaration of r3 outside the loop:

    T * restrict r3;
    for( int k=0; k<1000; ++k ) {
        r3 = p1;
        *r0++ = *p2++;
        p1[k] = r3[k+j];
    }
}

This makes the much stronger assertion that no value of p1[k] in any iteration overlaps any value of r3[k+j] for any iteration, whereas previously the assertion was only that p1[k] and r3[k+j] were disjoint for the same iteration. Moving the assignment r3=p1 outside the loop makes no difference; it's the scope of the declaration that matters.

Since restrict is not yet implemented by all compilers (and is non-standard in C++), the issue of portability arises. Fortunately, the definition of restrict allows it to be omitted as a token, so using:

#define restrict

suffices to make code amenable to strict C89 and C++ compilers.

The proposed restrict extension for C++ permits restricting reference types and restricting the this pointer. Kuck and Associates (KAI) implemented this proposal in the Edison Design Group (EDG) front end and gave the changes back to EDG. So some compilers based on the EDG front end may at least implement the C++ syntax extensions, even if they do not exploit the extra alias information. The User Guide for KAI C++ [5] describes these syntax extensions in more detail.

A Simple Test for Restrict

Most compiler features can be tested by writing a test case that uses the feature. However, restrict is special, because its definition permits a compiler to ignore it. So when used correctly, the only way to test whether the compiler really exploits it would be to tediously examine assembly code, or run timing tests, which are subject to extraneous effects such as cache quirks.

There is a better test. Use restrict incorrectly, and check whether the compiler gets the wrong answer. Indeed, the degree of wrongness can indicate which optimizations exploit restrict. Listing 1, restrict.c, shows exactly such a test. The test is a C/C++ program with restrict added. It consists of a series of functions that each compute:

a[i] = (b[j+N/4]+b[j-N/4])*0.5f;

where i runs from 0 to N-1. The secret of the test is that identifiers a and b point to exactly the same storage. Except for the baseline function simple, each function abuses restrict in a way that permits a clever compiler to presume that pointers a and b point to different storage. Each test function abuses restrict in a different way, to see if the compiler can exploits it that way.

The test makes some effort to thwart optimizations that we do not want to measure. Notably, the functions are called through tables sufficiently complex that current compilers should not be able to figure out the secret.

The output from the program is a series of lines that name the test, and which values of a[i] were "wrong." If all values of a[i] are right, the compiler ignored restrict. If a few values of a[i] are wrong, then the compiler probably used restrict for software pipelining. If many values of a[i] are wrong, the compiler probably hoisted (b[j+N/4]+b[j-N/4])*0.5f outside of the loop. Figure 1 shows what each of these transforms might look like if done by hand. Your compiler might do other transforms not shown. Remarkably, one compiler that I tested used it for advanced software pipelining, but not for elementary hoisting.

If you are reading this article more than a year after it was published, view the results with suspicion, since some vendors tune their optimizers to recognize published tests. In that case, you may want to foil cheaters by changing the body of the test to something different.

Recommendations and Predictions

There's no point soaking your programs in a tub of restricts if compilers do not exploit it — unless you think it helps document the program. Much the same is true about const. Contrary to popular opinion, it is of little use to optimizers since it can be cast away [2].

Therefore guidelines on using restrict necessarily depend on predictions of future trends in compiler optimization. Compilers are driven primarily by the market, not by technical limitations. In other words, restrict will be better implemented where programmers are willing to pay for the benefit. Compilers for embedded and DSP processors will make the strongest use of restrict, as that community is willing to pay for squeezing out cycles.

Uses of restrict range over a spectrum from easy to hard for a compiler to exploit. Below is a rough ordering in ascending difficulty. Keeping your uses to the simpler cases increases the likelihood of improving performance.

The simplest case is to use restrict on all formal pointer parameters that really are unaliased. For example, see the test with_restrict_formals. I would expect any compiler that makes any effort to exploit restrict to do well in these cases. From a compiler's viewpoint, a restricted pointer formal parameter can be optimized almost like the address of a local variable, so compilers that track addresses of local variables should do as well for this case as for local variables.
Pointers can be used for input (loading from memory) or output (storing into memory). Aliasing of two input pointers is rarely an issue with optimization. It's output/input and output/output pairs that are of interest. Hence, in principle it suffices to restrict only output pointers. (See the test with_restrict_out_formal.) However, that leaves the compiler with the problem of proving that the input pointer is not based on the output pointer. You can also restrict only the input pointers. (See the test with_restrict_in_formal.) But doing so does not solve the problem of output/output pairs, if there are any.
To avoid cluttering prototypes with restrict, declare some local restricted pointers inside a function, initialize them, and use them to do the work. (See the test with_restrict_local, but with wash removed.) You'll do best declaring the local restricted pointers in the outermost block. If declared in an inner block, alas, there are some subtle issues that require the compiler to work much harder than in the first case, because when local variables go out of scope, they become undefined, but when restricted pointers go out of scope, their targets may live on.
If your compiler cannot exploit restrict on local pointers in inner blocks, it probably won't do much for parameters in inline functions. Inlining turns the latter case into the former. The effect surprises FORTRAN programmers: optimizers sometimes do worse on FORTRAN code that has been inlined by hand, when the aliasing information provided by the formal parameters is lost. So be warned that making functions with restricted parameters inline (another C9X feature) may actually hurt performance.
Pointer chains should have restrict "all the way down" to benefit. For instance, writing:

float * restrict * p;   /* probably useless */

says that p points to a pointer lvalue that points to a restricted pointer to a float lvalue. Since there may be other pointers to the pointer lvalue, the declaration does not help much. Write this instead:

float * restrict * restrict p;

That way, the compiler knows that the object **p can be reached only through *p, and that *p can be reached only through p.

restrict can be used on any pointer lvalue, including fields of structures. However, very few compilers make much effort to track fields, and this is unlikely to change much for run-of-the-mill compilers. Remember too that operator-> constitutes part of a pointer chain.
Restricted pointers, just like addresses of local variables, can be assigned to entities so complicated that the compiler considers the pointer "lost." For example, most compilers consider the following sorts of pointers to be lost: addresses of global variables, pointers passed to or from another function, and (recursively) pointers assigned to objects whose address is lost. Even compilers that are more aggressive eventually lose pointers in some contexts. In general, a compiler will have to pessimistically assume that two lost pointers may point to the same object unless both are qualified with restrict. The work around to lost pointers is simply to make restricted copies of the unrestricted pointers in question.

Conclusion

The keyword restrict helps close a major performance problem in C/C++. Market forces will dictate how well optimizers exploit it. My experience is that it works best on simple inner loops that operate directly on restricted pointers passed as formal parameters. This will probably remain true for run-of-the-mill compilers. In specialized markets for high-performance compilers, restrict will be exploited more, to the point where C++ is competitive with FORTRAN.

Acknowledgement

Bill Homer of SGI provided extensive commentary and corrections to an earlier draft.

References

[1] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994), pp. 157-158.

[2] Andrew Koenig. "How much does const promise?," C++ Report, May 1997, 9(5), pp.7-8.

[3] Section 6.7.3 of C9X draft. URL is http://osiris.dkuug.dk/jtc1/sc22/wg14.

[4] C9X Rationale. Same URL as C9X draft.

[5] User Guide for KAI C++. URL is http://www.kai.com/C_plus_plus/online_doc/UserGuide/chapter_4.html#restrict_keyword.

Arch D. Robison designed the C++ optimizer inside KAI C++. He previously worked at Shell's Bellaire Research Center on parallel processing for seismic imaging. He mingled with the FORTRAN crowd there. His goal is to push the state of the art for C++ performance to the point where FORTRAN users are envious. You can reach him at [email protected].