C/C++

Numerical Extensions to C

By Robert Jervis, August 01, 1992

Can C ever hope to compete with Fortran when it comes to numeric programming? The Numerical C Extensions Group, a subcommittee working under ANSI auspices, thinks so, and it may be laying the foundation for the next version of the C standard.

AUG92: NUMERICAL EXTENSIONS TO C

Bob is an independent consultant, programming, writing, teaching C courses, and developing new operating-system and language software. He can be reached at [email protected].

There's a joke in the supercomputing community that scientists don't know what programming language they'll be using in the 21st century, but they do know it'll be spelled "Fortran." If the extensions being designed by the NCEG succeed, 21st century scientists may instead spell it "C."

The Numerical C Extensions Group (NCEG) began as a working group of members of the ANSI C committee interested in numerical programming issues, and it has been meeting regularly since mid-1989. In March 1991, the group was officially recognized as a subcommittee working under ANSI with the designation of X3J11.1. This group will not produce a full standard, but instead will write a technical report. In the world of standards a technical report does not carry as much weight because government agencies will not demand conformance to it. However, the NCEG report will act as practical guidance to anyone wanting to extend C in the directions covered by the report. When the C standard is next revised around 1995, the directions of the NCEG report will be important input for the new standard.

The group has organized its activities into eight topics: aliasing, array syntax, IEEE floating-point support, complex arithmetic, variably dimensioned arrays, exceptions and errno, aggregate initializers, and extended-integer range.

Progress on each of these topics has proceeded at its own pace, with widely varying degrees of activity and interest. Some of these topics, such as IEEE floating point, have proceeded to the point that drafts of their section of the report are circulating among other committees for comment. Others, like the recently added extended-integer ranges, still have fundamental issues to resolve.

Aliasing

An alias exists whenever you have more than one way to reach a piece of memory. When a pointer contains the address of a static or automatic variable, that pointer is an alias of the variable. When two pointers point at the same memory location, they are aliases for the same object.

Pointers in C have always presented formidable problems for optimizing compilers, because C code has been written for years with numerous aliases and no way to tell the compiler where the aliases are present. C code routinely bumps a pointer through an array, for example, instead of using subscripting.

When a compiler tries to optimize code that contains references using several pointers, the compiler must do one of two things: It must either trace the history of those pointers to determine whether they are aliases or not, or assume the worst and act as if the pointers might be aliases. Most advanced optimizing compilers do go to the trouble of tracing histories, because the benefits can be profound.

On a PC, knowing that two pointers are not aliases can help the compiler keep a few temporaries in registers rather than have to recompute values. In a tight inner loop that might mean generating half as many memory references. On a supercomputer, however, knowing two pointers are not aliases can determine whether or not the compiler can use vector instructions. In the same tight inner loop, using vector instructions can mean more than a factor-of-ten improvement in speed!

When tracing the history of a pointer, a C compiler runs up against a brick wall at the top of a function. A compiler doesn't know whether the arguments passed to a function are aliases. Unless the compiler can find all the calls to the function to continue tracing back (and I don't know of a C compiler that even tries), the compiler must give up and assume that aliases might be present.

Some of you may remember that the original ANSI C committee put a keyword into the language called "noalias" to solve this optimization problem but withdrew it. Since that attempt, people have gone back and tried to eliminate the major problems with noalias. The proposal currently being considered uses a different keyword, "restrict." The basic idea is that any pointer can be declared as being a restricted pointer, which means that when the pointer is created, it is not an alias of anything else. This is really a promise from the programmer.

If you declare a pointer-function parameter to be restricted, you are promising that no one will call this function with an alias in that parameter. When the compiler traces back to the top of the function, it finds your promise and can conclude that no aliases are present. The variable in Example 1(a) is restricted. Only pointers used to modify memory really matter as far as aliases are concerned, so b and c do not need to be marked as restricted pointers.

Example 1: (a) A restricted variable; (b) calls involving undefined behavior.

  (a)

  void f3(int n, float *restrict a, float *b, float *c) {
     int i;
     for (i = 0; i < n; i++)
       a[i] = b[i] + c[i];
  }

  (b)

  float   x[100];
  float   *c;

  void f5(int n, float *restrict a, float *restrict b) {
     int i;
     for (i = 0; i < n; i++)
       a[i] = b[i] + c[i];
  }
  void g5(void) {
     float   d[100], e[100];
     c = x;

     f5(100,   d,    e); /*Behavior defined.  */
     f5( 50,   d, d+50); /*Behavior defined.  */
     f5( 99, d+1,    d); /*Behavior undefined.*/
     c = d;
     f5(100,   d,    e); /*Behavior undefined.*/
     f5(100,   e,    d); /*Behavior defined.*/
  }

In Example 1(b), two of the five calls involve undefined behavior because some sort of overlap exists that involves modified values. The first call works because it is just adding arrays x and e together and storing them in d. The second call illustrates that contiguous portions of arrays can be used, as long as they don't overlap. The third call involves adding d to itself, but shifted by one element. An optimizing compiler might perform f5 using vector hardware that would produce different results, depending on the size of the vectors and the order in which pieces are calculated. The fourth call fails because d is being aliased. This particular example will probably get it right, but that depends on exactly what f5 does internally. A slightly modified f5 (for example with a[i] = b[i] + c[i+1]) could get the wrong answer. The last example works fine because the alias is between two pointers that are not used to modify data. A draft of this proposal is now circulating among other standards bodies for comment.

Array Syntax

The subgroup examining array syntax has undergone a significant transformation since the forming of NCEG. The original topic was focused on syntax to make generating code for vector processors easier. Supercomputers like the Cray X/MP are designed to perform arithmetic on whole blocks of an array at a time by providing a separate floating-point unit for each element of a vector. It is a natural fit, then, for a programmer to be able to write expressions involving whole arrays at a time.

In the last couple of years, however, emerging interest in massively parallel computers has shifted the focus of the group. In a nutshell, Amdahl's Law states that as you add more processors attached to a shared memory, you reach a point of diminishing returns where additional processors block each other from access to the memory. Massively parallel computers try to solve that problem by giving each separate processor its own memory. Advances in microprocessor technology have made that approach economically attractive. Unfortunately, most existing programming languages like C were not designed for a massively parallel computer. [Editor's note: For more discussion on Amdahl's Law, see "Personal Supercomputing: Seamless Portability" by Ian Hirschsohn, DDJ, July 1992.]

What does it mean to have a pointer when there may be 10,000 processors, each with 10,000 separate memories? Massively parallel computers solve complex problems by distributing arrays across the many memories of the machine. Each processor solves the problem on its local piece of the array, without competing with other processors for access to memory. The conventional C approach of bumping a pointer through an array simply does not work efficiently in that environment.

At the January 1992 NCEG meeting, the array-syntax group decided to focus on solving this problem. Several existing dialects of C have tackled this problem, including C* from Thinking Machines and MPC from MasPar. The array-syntax group will be hammering out a compromise that merges the capabilities of those dialects into a data-parallel C.

The interest in this group is high and a special meeting was held at Thinking Machines in April to move the group rapidly forward. The group has decided to use the C* Reference Manual as its base document for future work. This decision was difficult to achieve because this group has received 14 different significant proposals, whereas the other NCEG groups have received only one or two proposals each.

The C* language introduces a new aggregate type called "shape" that has dimensions like an array, except that elements are not stored contiguously in memory and are usually distributed across multiple memories. The C operators are overloaded to operate element-wise on objects of the same shape. Operations which are not element-wise or involve operands of different shapes require the use of a new left-indexing subscript operator. Example 2(a) illustrates the use of a shape.

Example 2: (a) Sample use of a shape; (b) this assignment can be performed 100 times; (c) this code is the equivalent of that in (b).

  (a)

  Shape [10] [10] Sa;

  double: Sa a, b, c;

  f() {
       double n;
       int i;

       a = b + c;
       n = [4] [i]b + 2.0;
       where (b < 5) {
            a++;
            c = b;
       }
  }

  (b)

  void f (void) {
       iterator I = 100;
       float a[100], b[100];

       a[I] = b[I] + 1;
  }

  (c)

  void f (void) {
       int I;
       float a[100], b[100];

       for (I = 0; I < 100; I++)
           a[I] = b[I] + 1;
  }

The arrays b and c can be added together as if they were normal C variables, but because they have been declared with a shape, corresponding elements of the shape will be added instead. The left indexing illustrates how a single element of a shape can be selected. Left indexing is rarely needed, however, because other operations exist that select subsets of a shape. For example, the Where statement above selects only those positions of the shape for which the test is true, and then executes the block code (incrementing a and copying b to c) only for those selected positions.

There is still some controversy concerning whether parallel operations and data distribution can be made independent. An alternative proposal has been put forward involving a concept called iterators to express parallel operations. An "iterator" is a new kind of object that signals the need for "iteration" when it is used in an expression. Example 2(b) causes the assignment to be performed 100 times. An easy way to think about iterators is that expressions are executed as if a for loop were wrapped around the expression using the iterator variable in the loop. So Example 2(b) can be equivalent to Example 2(c).

Iterators do not address the problems of data distribution in massively parallel computers, but they may be integrated into a larger proposal that does address these issues.

IEEE Floating-point Support

Since more and more machines are standardizing on IEEE floating-point hardware, the floating-point subgroup has been working to specify how C should be implemented on machines supporting IEEE floating point. The most important elements of the IEEE floating-point standard (ANSI/IEEE 754-1985) not found in Standard C are the notions of infinity and NaN (not-a-number), along with the notion of the floating-point environment and a more complete computational library.

An infinity is a value so large in magnitude that it cannot be represented in the range of a floating-point value, so it is more than just the mathematical concept of infinity. Any nonzero number divided by 0 produces infinity, but so does a very large number divided by a very small number, as long as the result overflows the floating-point range.

A NaN is a mechanism that IEEE defined to carry information about erroneous conditions that arise during computations. For example, trying to compute the logarithm of a negative number produces a NaN. A programmer need not check hardware error flags or C's errno after every step of a computation. Once a NaN has appeared in a computation, any further expressions using that NaN also produce NaN. Rather than having errors introduce numbers that might be produced by normal computations, a NaN is uniquely identifiable as an error condition.

A floating-point environment is a set of dynamic runtime flags that control rounding of intermediate results and reporting of exceptional conditions.

The data types used to hold intermediate results are another area addressed by this group. The current C rules demand that at each operator, the computation occur with a precision at least as wide as the wider of the operands, but otherwise leave it up to the compiler. One of the alternative evaluation methods being discussed, called "widest need," would widen all values to the widest type appearing anywhere in the expression, not just at each operator. In Example 3, the value of x + y is computed in at least float precision, according to the C standard. Using widest need, this addition is computed using long double precision because the assignment (and thus the multiply and the additions) all need long double.

Example 3: The value of x+y is computed in at least float precision according to the C standard.

  float        x, y, z;
  long double  u, v;

  u = (x + y) * (z + v);

An additional area not demanded by the needs of IEEE arithmetic, but deemed to be useful in general, is overloaded functions. The IEEE group defines a new header (fp.h) that contains a set of overloaded functions, such as sqrt, that use a distinct implementation that depends on the types of their arguments. User-supplied overloading is not provided, but C++'s overloading could accomplish most of the capabilities needed. Widest-need evaluation presents a unique problem because it is supposed to affect the choice of overloaded functions. In effect, C++ overloading only uses the types of the function arguments to pick which function to call, while widest-need evaluation uses the needed return type as well.

The IEEE subgroup has produced a document that is now circulating among other standards groups for comment in draft form. As much as possible, the extensions have been defined so that any floating-point hardware could take some advantage of them.

Complex Arithmetic

The complex arithmetic group has produced a specification for complex data types of three precisions: float complex, double complex, and long double complex, representing values in Cartesian, not polar, coordinates.

These types are accompanied by a new floating-constant suffix, i, which promotes the constant to have the corresponding complex type. Example 4 assigns a value to x, with the real part being 3.0 and the complex part being 5.0. This notation very closely mimics the notation used by mathematicians.

Example 4: Assigning a value to x with the real part being 3.0 and the complex part being 5.0.

  double complex x;

  x = 3 + 5i;

The proposal also extends the normal arithmetic operators for complex types and adds conversion rules for combining complex, floating-point, and integral types in expressions.

A new header (complex.h) has been proposed that contains prototypes for complex functions like complex sines, cosines, exponentials, logarithms, powers, and square roots. In addition, functions exist to extract the real and imaginary parts of complex numbers, as well as the conjugate of a complex number.

An issue which has not been completely resolved is whether there should be a type called "imaginary." This would strictly be the imaginary part of a complex number. Mathematically, an imaginary is just a complex number with a zero real part. For C, this type would solve some exotic issues relating to NaNs.

Variably Dimensioned Arrays

The ability to specify varying array bounds for multidimensional arrays would greatly simplify the code used to access such array parameters. For example, a matrix-multiply routine that works on two-dimensional square arrays today must pass those arrays as simple pointers and explicitly perform the index arithmetic. Clearly, Example 5(a) indexing would be much more obvious if it could be written as in Example 5(b).

Example 5: (a) Indexing; (b) clearer indexing; (c) arrays of variable dimension.

  (a)

  void f(int m, float *a) {
     ... a[i * m + j] ...
  }

  (b)

  void f(int m, float a[m] [m]) {
    ... a[i][j] ...
  }

  (c)

  void f(int n) {
     float a[n];
     /* ... */
  }

There are currently two competing proposals. In the "fat-pointer" proposal, one can declare pointers to variably dimensioned arrays. These objects are pointers that include not only an address but also a descriptor for each of the dimensions of the array being pointed at.

The alternative proposal allows variable dimensions on array parameters to functions as a way to simplify writing library routines for multidimensional arrays.

Both proposals now allow automatic arrays of variable dimension for more flexible local storage; see Example 5(c).

Some implementations of C provide a nonstandard function alloca that allows one to allocate auto storage that is automatically recovered on function return (usually done by directly manipulating the hardware stack pointer). This function is not possible on some hardware, however, which is why it is not part of the Standard. Variable-length automatic arrays have the same problems. The proposal that allows automatic variable-length arrays allows an implementation to use malloc to allocate space for the object. As a result, setjmp and longjmp may not work properly around variable-length arrays.

Exceptions and errno

The specification of Standard C requires that errno be set when certain errors occur in math functions. This mechanism is not the approach that IEEE has advocated for signaling error conditions. A number of hardware implementations are significantly slowed down by the need to set errno correctly.

Not coincidentally, the IEEE floating-point support <fp.h> math functions specifically do not have to set errno.

At this point a specific proposal has not been formulated. The general feeling is that the support for errno may not be present in an extended math library, which will perhaps use the IEEE floating-point environment instead.

Aggregate Initializers

Aggregate initializer extensions have been proposed to solve two slightly different problems. First, programmers must be able to initialize selected elements of an array or structure. Second, they must be able to specify an initializer as a constant in an expression.

A proposal has been accepted that allows either an array subscript expression or a structure member name to appear in an aggregate initializer. This selector may appear before any individual value in the initializer. That value is then assigned to the named element or member, and subsequent values are assigned to consecutive elements or members after that.

Initializing selected elements of an array or members of a structure simplifies several different situations. In Example 6(a), the div_t type is specified as part of the standard as having two members, quot and rem, for the quotient and remainder of a division, respectively. But which member comes first? With the above initializer, it doesn't matter.

Example 6: (a) The div_t type is specified as part of the standard as having two members; (b) initializing an array with the same elements as an enumerator; (c) an array that needs nonzero entries near each end of the array; (d) combining subscripting and member selection; (e) using an extension to initalize a union; (f) constructing aggregate constants in expressions.

  (a)

  div_t answer = { .quot = 1, .rem = 0};

  (b)

  enum etag { Mem1, Mem2, /* ... */ };
  const char *nm[] = { [Mem2] = "Mem2", [Mem1] = "Mem1",
                                      /* ... */};

  (c)

  int a[MAX] = {
           1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0

};

  (d)

  struct s { int a[3], b; };
  struct s w[] = { [0].a = { 1 }, [1] .a[0] = 2 };
  struct s z[] = { { 1 }, 2 };

  (e)

  union utag { /* ... */ } u = { .any_member = 42 };

  (f)

  struct point { int x, y };
  void drawCircle (struct center, int radius);

       struct point p;

       p.x = 11; p.y = 12;

       drawCircle(p, 15);

  drawCircle ((struct point) { 11, 12 }, 15);

For an array that must be initialized with the same elements as an enumerator, the code in Example 6(b) can be used. You can be confident that in the table, the strings will correspond to the correct array elements, even if you later add more enumerators or change their values.

Suppose you have an array that only needs nonzero entries near each end of the array, as in Example 6(c). Subscripting and member selection can be combined; see Example 6(d).

Both w and z are initialized to the same values. The initializers are both inconsistently bracketed, but in the first case, it is clear which elements are being set to which values. This extension can even be used to initialize a union; see Example 6(e).

There has also been some sentiment for supporting initialization for ranges of array elements. Often, you wish to set all the elements of an array to something other than 0. Several suggestions have been made for possible syntax, but nothing has yet been accepted. There are other extensions that may also need to express ranges of values, so it is likely that once some notation can be agreed upon for expressing ranges, they will be adopted wherever appropriate.

Another area that has an accepted proposal provides the ability to construct aggregate constants in expressions; see Example 6(f). In the second call, an aggregate constant is used to avoid creating an extra variable. Nonconstant runtime values can appear in the aggregate constants.

Extended Integer Range

Support for extended integer range is the newest addition to the work items of NCEG. Numerous hardware vendors either offer machines with 64-bit integers, or are planning to introduce such machines. C programmers have always thought of char, short, and long integers as being 8, 16, and 32 bits wide. Standard C requires that each of these types be at least that wide. These new machines with 64-bit integers present some problems. If such a machine makes long 64 bits wide, any software that assumes that long is exactly 32 bits wide will fail. Similarly, if int is made 64 bits wide, software that assumes int is either 16 or 32 bits wide will fail. Probably most troublesome of all, on machines that support 64-bit integers, programmers will still want 8-, 16-, and 32-bit integers as well. C can make char, short, int, and long each into distinctly sized integers, but what happens when -- some day -- 128-bit integers appear.

As it happens, when surveys of actual C code have been made, it turns out that the biggest problem is with configurations in which the size of int differs from the size of pointers. After years of telling programmers that ints and pointers are not interchangeable, significant amounts of code still make that assumption. And although we may condemn such code, we must accept that it exists and that customers will not be happy if they are forced to convert.

Because of these difficulties, some implementations have decided to create a new type, "long long," that is used for 64-bit integers. The members of NCEG feel strongly that "long long" is a very poor design choice. The extended-integer range group has decided that despite disapproving of "long long" as a new integral type, they will produce some guidelines as part of the technical report. These guidelines will point out practical portability issues involved in defining a "long long" type.

Given the strains of incorporating an additional integer size into the C type system, there is a strong desire among group members to find a more general solution. The solutions suggested have centered around being able to declare the minimum size of an integer. Integers could be specified as being some minimum number of bits, or else as a range of values. In Example 7, for instance, the declarations would make x a variable that is a signed integer at least 32 bits wide. The type of y is some integer large enough to hold the values from 1 through 100. This capability is similar to the ability to define integer ranges in Pascal.

Example 7: The declarations make x a variable that is a signed integer at least 32 bits wide. Type y is some integer large enough to hold the values from a through 100.

  int:32 x;
  int {1 .. 100} y;

While the work is still very preliminary, the type system of C is not likely to be as strict as that for Pascal. Integer ranges in C would simply be synonyms for one of the supported integral types. Significant work is still needed on conversion rules, along with the exact meaning of these declarations.

One question still unanswered is whether a programmer will only be able to specify types that are at least as wide as desired. Some wish to specify that they want an integer exactly N bits wide. This is analogous to the existing bit-fields of C, except perhaps allowing arrays of these new integer types.

Conclusion

There has been a subtle shift in the standards-making process since the beginning of the ANSI C committee in 1983. At that time, a standards committee was assembled after a language had been in use for some time and a de facto standard had been determined by the marketplace. The standards committee was charged with ironing out the differences between implementations and writing a clear specification for all the dark corners.

Nowadays, standards committees are being used to create standards from scratch. Companies are using these neutral forums as a way to arrive at sensible solutions without having to risk years of possibly expensive conflict between competing products. The Beta/VHS battle is an obvious example from the consumer-electronics industry of the costs of letting the market pick winners.

The NCEG group is part of the new spirit in standards. Competing vendors can come together and share their expertise before they invest large sums of money in building C compilers and before customers invest huge sums of money in competing designs. The customers are the ultimate winners. They can buy a C compiler that supports an NCEG extension and feel confident that they can port that code to another vendor.

Will C, with the addition of the NCEG extensions, replace Fortran? The NCEG committee does not see this as the goal of the group, nor is it likely to be accomplished. Programmers continue to use Fortran for reasons that have nothing to do with whether it is the best language in the world. Decades worth of code have built up in Fortran that no one wants to convert to C. Programmers with decades of experience aren't willing to change languages easily.

What will emerge from the NCEG extensions is a better C for those who want to use it for numerical programming. The data-parallel extensions promise the possibility that C will become one of the first truly portable languages available for massively parallel supercomputers. If that happens, C will be leading the way into the 21st century and the brave new world.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Numerical Extensions to C

Aliasing

Example 1: (a) A restricted variable; (b) calls involving undefined behavior.

Array Syntax

Example 2: (a) Sample use of a shape; (b) this assignment can be performed 100 times; (c) this code is the equivalent of that in (b).

IEEE Floating-point Support

Example 3: The value of x+y is computed in at least float precision according to the C standard.

Complex Arithmetic

Example 4: Assigning a value to x with the real part being 3.0 and the complex part being 5.0.

Variably Dimensioned Arrays

Example 5: (a) Indexing; (b) clearer indexing; (c) arrays of variable dimension.

Exceptions and errno

Aggregate Initializers

Extended Integer Range

Example 7: The declarations make x a variable that is a signed integer at least 32 bits wide. Type y is some integer large enough to hold the values from a through 100.

Conclusion

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Numerical Extensions to C

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content