The New C: Integers, Part 2

The new C Standard has a novel idea: just accept that machine-word sizes will grow. Randy explains C's proactive strategy for accommodating the inevitable.


January 01, 2001
URL:http://www.drdobbs.com/the-new-c-integers-part-2/184401339

January 2001/The New C


In December’s column, I began discussing the new integer features of C99, in particular, the long long data type. In this month’s column, I discuss how the C99 Standard generalizes the support for integers and allows your compiler to support additional integer data types. While it might not be immediately obvious, these C99 features were motivated in part by the introduction of 64-bit machines.

As the C99 Standard was being developed, 64-bit machines started to appear, and with them came the contentious issue of the mapping from C keywords to hardware integer data types. An informal group of vendors of 64-bit hardware and software met a few times to reach consensus on a common mapping, and failed to do so for a period of two or three years. Many of the proposed mappings called for a new integer type to be added to C. For example, one proposal mapped int and long to 32-bit integers and specified a new C type for 64-bit integers. Another proposal mapped int and long to 64-bit integers and specified a new C type for 32-bit integers.

Eventually, two 64-bit mappings won. Vendors most concerned about compatibility with their 32-bit offerings mapped short to 16 bits, int and long to 32 bits, and added a new 64-bit integer type (long long). Vendors most concerned about elegant access to 64-bit hardware mapped short to 16 bits, int to 32 bits, and long to 64 bits. (This is also the mapping used in Java.)

The C standards committee realized that there was a lesson to be learned from the 64-bit vendors: C probably would gain more integer types in the future. Someday, there will be interest in 128-bit integers. Occasionally, there is interest in adding unusual integer types, such as specialized counters for digital signal processors or integers with different "endian-ness" (byte ordering). The committee decided that it would be best if the Standard addressed the issue of adding new integer types to C in order to give direction to both implementations and programmers. (Some programmers complained about the addition of long long to the language since, rather than use typedefs or macros, they had hard-coded the information about the largest integer type in their programs.)

Generalized Integers

The model for extended integer types (as the Standard calls implementation-defined integers) should seem pretty natural to most C programmers. Like the standard integer types (as the Standard calls the integers with which you are familiar, such as int, signed char, and unsigned long), extended integers come in pairs of types. For every extended signed integer type, there is a corresponding extended unsigned integer type. Together the standard integer types and the extended integer types (along with enum types and char) are collectively known as the integer types. Thus, all of the statements in the Standard about the integer types also apply to any extended integer types supported by an implementation.

The Standard does not say what the names of any extended integer types are. An implementation might name the types using a combination of current keywords (e.g., long long long int) or invent new keywords spelled using the name patterns that the Standard reserves for use by implementations (e.g., __int24 or _BigEndian32).

Since C89, integers in C have been required to be binary numbers, and signed integers are required to be represented in one’s complement, two’s complement, or sign and magnitude notation. Integers (except unsigned char) may have unused bits in their representation. (On some machines, integers have the same storage representation as floating-point numbers with the exponent bits ignored.) Integers might have representations for both positive and negative zero. Extended integers obey these same rules. Thus, extended integers cannot be binary coded decimal. Since integers in C have fixed sizes, extended integers cannot be LISP-like bignums whose storage dynamically grows and shrinks to hold numbers of unlimited size. Extended integers are like standard integers. If a programmer was given a typedef defined to be an integer, it would not make much difference whether the typedef was an extended integer or an unspecified standard integer.

Expression Evaluation

The rules for how extended and standard integers work in expressions is determined by a new concept called the integer conversion rank. The Standard requires every implementation to rank all of its integer types according to the following rules:

As you probably suspect, the integer conversion rank is used to define the rules for expression evaluation. Those rules are named the integer promotions and the usual arithmetic conversions.

The integer promotions rule states that an integer with rank less than int may be used in an expression anywhere where an int or unsigned int may be used (so can a bitfield). The integer value used in the expression is converted to int or unsigned int, depending upon which can hold all of the values of the original type. For example, if short and int have the same representation, an unsigned short promotes to unsigned int when used in an expression.

The usual arithmetic conversions are the rules that determine the result type of most of the two-operand operators in C when they are operating on integer or floating-point operands. The usual arithmetic conversion rules state that first you perform the integer promotions on any integer operands. Then you determine a common type, convert the operands to the common type, and the result of the operator has that common type.

The first few rules for determining the common type for cases involving integers are pretty simple:

Unfortunately, the Standard permits an implementation to make all of the integer types the same size, and that complicates the rules for when a signed integer type meets an unsigned integer type. Normally, if you multiplied a long by an unsigned int, you would expect the common type (and thus the result type) to be long. However, if int and long have the same representation, an unsigned int behaves like an unsigned long in terms of expression evaluation. Thus, when int and long are "the same," multiplying a long by an unsigned int is like multiplying a long by an unsigned long: the common type is unsigned long. The rules for when one operand has signed integer type and the other operand has unsigned integer type are:

While the integer conversion rank and expression evaluation rules may seem abstract, they are nothing more than the rules programmers assume about integers every day, particularly when dealing with typedefs. Consider a simple statement like: x = y + z; where x, y, and z all have the same typedef type and the only thing you know about that type is that it is a signed integer. You assume that the result of the addition has the same type as the variables, or that it is int if the type of the variables is smaller than int. You assume that the result does not overflow unless the true result would not fit in x. These are the sort of properties that result from the rules. They guarantee consistent and natural expression evaluation.

Constants

C99 does not introduce any new special syntax for integer constants that have an extended integer type. However, implementations might introduce such syntax as an extension. For example, if an implementation has a 128-bit integer type named long long long, it might allow you to suffix a decimal constant with LLL to indicate its type.

In C, integer constants have a type based on their value. For example, a decimal integer constant without a suffix has the first type from this list that can represent its value: int, long, or long long. The Standard permits a constant to have an extended integer type if its value cannot be represented by long long (or unsigned long long, if appropriate) and the extended type can represent its value.

The extended integer type used to represent a constant must be a signed type if the constant is normally signed (e.g., a decimal constant without a U suffix). The extended integer type must be unsigned if the U suffix appears in the constant. For octal or hexadecimal constants without a U suffix, the extended type may be signed or unsigned. (Unsuffixed octal and hexadecimal constants normally have the first type from this list that can represent their value: int, unsigned int, long, unsigned long, long long, or unsigned long long.)

Next Month

Next month’s column wraps up integer support in C99 by discussing the new headers <stdint.h> and <inttypes.h>. These headers allow you to use both standard integer types and extended integer types in a more portable fashion.

Randy Meyers is consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at [email protected].

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.