Channels ▼


The State of C

C is a general-purpose programming language that was originally designed and implemented around 1972 by Dennis Rithie at Bell Labs. Its early growth was closely associated ith the Unix system where it was developed, since both the tern and most of the programs that run on it are written in C. In recent years, C become popular in a much wider variety of environments, and it is no longer tied anyone operating system or machine.

The C language was originally designed for "systems programming" — that is, for writing programs like compilers, operating systems, and text editors. But it has proven quite satisfactory for other applications as well, including database systems, telephone-switching systems, numerical analysis, engineering programs, and a great deal text-processing software. Today, C is one of the most widely used languages in the world, and C compilers exist for almost every computer.

Where Did It Come From?

C has its roots in the language BCPL, designed by Martin Richards around 1967. BCPL is a "typeless" language: It operates only on a single data type, the machine word. As such, BCPL was an excellent match to the hardware of word-oriented machines. In 1970, Ken Thompson designed a stripped-down version of BCPL for use with the first Unix system on the PDP-7; this language was called B. It too is typeless.

With the advent of the PDP-11, on which the next version of Unix was written, it became clear that a typeless language did not match this hardware nearly as well. The PDP-11 provided several different sizes of fundamental objects — 1-byte characters, 2-byte integers, and 4-byte floating-point numbers — and B provided no way to even talk about these different-size objects, let alone operators to manipulate them.

The C language was originally an attempt to deal with a variety of types of data by adding the notion of data type to the B language. In C, as in most languages, each object has a type as well as a value; the type determines the meaning of the operations that can be applied to the value, and how much storage is occupied. For example, declarations like int i, J;, double d; , and float x; determine the operations and space requirements of the variables. In the statement d = x + i * J; , the compiler uses the type information to determine that integer multiplication is adequate for i * J, but the result must be converted to floating point before it is added to x and then converted to double precision for assignment to d.

Although C was originally implemented for a PDP-II, it was used on other machines as early as 1975. Steve Johnson implemented a "portable compiler," designed to be relatively easy to modify, to generate code for different machines. Since then, C has been implemented on most computers, from the smallest microcomputers to machines as large as the CRAY-2. The C language is sufficiently well standardized, even without a formal standard, that with some care you can write C programs that will run without change on any machine supporting the language and a minimal run-time environment.

C began on a small machine and was derived from a sequence of small languages; its designer preferred simplicity and elegance to features. Furthermore, C has, from the beginning, been meant for system-programming applications, where efficiency matters. Accordingly, it's not surprising that C is a good match for the capabilities of real machines. For example, it provides as its basic data types only those objects that are directly supported by typical hardware: characters, integers (perhaps of several sizes), and floating-point numbers (again in several sizes).

You can create more complicated objects like arrays, structures, and so forth, but C provides few operators for manipulating them as a unit; you must write the functions that compare strings, assign one array to another, and so on.

Somewhat more unusual, C doesn't provide input and output operations as part of the language. This is not to say that C programs can't do I/O, of course, but simply that IO is done by functions defined by the user or in a library, and not by built-in statements of the language. This is in contrast to, for example, FORTRAN's READ and WRITE, and the INPUT and PRINT of BASIC, which are parts of those languages.

To complete the list of things that C might provide but doesn't: It has no storage management, like Pascal's new function, and no facilities for concurrent processing, such as Ada's rendezvous mechanism. You can easily write these capabilities in C, but they are provided by function libraries, not as part of the language. Function calls are notationally clumsier than direct operators; for example, compare BASIC's string comparison


to the way you might write it in C:

if (equal(a,b))...

Function calls also involve more overhead than in-line code.

In any case, the degree to which features are omitted from C is one of its distinguishing characteristics.

Linguistic Elements

Control flow: Control flow in C is quite conventional, although richer than in FORTRAN or BASIC. C contains two decision-making statements: if...else and switch. In the statement

if (expr) statl else stat2

expr is evaluated; if it's true (nonzero), stat1 is executed; otherwise, stat2 is executed. The entire else part of the statement is optional. In

switch (expr) {
   case const1: stat1 
   case const2: stat2
   default: stat

expr is evaluated and its value compared against the various consts. If it finds a match, the corresponding stat is executed. If it doesn't, the stat for the default part is executed. The default is optional. The switch statement is like Pascal's case statement, except that Pascal has no default.

C also contains three loops: while, for, and do. In the statement

while (expr) stat

expr is evaluated; if it's true, stat is executed, and expr is evaluated again. When expr becomes false, the loop terminates. The statement:

for (stat1; expr; stat3) stat2

is equivalent to the while loop:

while (expr) { 

The do statement is like Pascal's repeat...until except for the sense of the termination test. In the statement:

do stat while(expr)

stat is executed, and expr is tested. If it's true, the loop repeats.

The statement break causes an immediate exit from an enclosing loop or switch; the statement continue causes the next iteration of a loop to begin. C also provides a goto statement, but it's infrequently used.

In all these examples, a stat can be a single statement like x = 3 or a group of statements enclosed in braces, which are like begin...end in other languages. Statements end in semicolons.

Data types: The basic data types in C are char (a single byte); int, short, and long (integers of various lengths); and float and double (floating-point numbers of two different lengths). The char data and the various integers can be signed or unsigned.

You can combine these objects into an infinite (in principle) set of "derived" data types using arrays, structures, unions, and pointers. Arrays are familiar:

char mesg[100];

defines an array mesg of 100 bytes, accessed as mesg[O] through mesg[99]. C doesn't provide a string data type; it uses arrays of char instead, with the end of the data marked by a 0 byte. This is what the compiler generates for a string constant like "hello world\n". Within a string, certain "escape sequences" like \n are used to represent special characters like newline. This string contains 12 characters and a terminating 0 byte.

A structure is a collection of related variables that need not have the same type (like a record in Pascal). For example,

struct object {
   int x, y;   /* position */
   float v;    /* velocity */
   char id[10]; /* identification */ 
struct object obj;

declares a structure called obJect and defines a variable obJ of type struct obJect. Individual members of the structure are referred to as obJ.v, and so on. Notice that the obJect structure includes an array id, whose components are [0] through [9].You can have arrays of structures, as well.

C provides pointers, or machine addresses, as an integral part of the language, in a much less restricted form than in Pascal and Ada. The declarations

char *pc;
struct object *pobj;

declare pc to be a pointer to char, and pobj to be a pointer to an object structure. The value that a structure points to is accessed by *pc or *pobj, as suggested by the form of the declaration; the "dereferencing" operation * is equivalent to the caret (^) in Pascal. Individual members of the structure are accessed by, for example, pobj->v.

If p is a pointer to an object of type T and currently points to an element of an array of Ts, then p+1 is a pointer to the next element in that array. Similarly, if p and q point to elements of the same array, and p is less than q, then q-p is the number of elements from p to q. In short, arithmetic operations on pointers are scaled by the size of the object to which they point; the actual size is usually irrelevant as you program. When it is relevant, a sizeof operator exists to compute it, so the program doesn't specify the explicit size for any particular machine. C's complete integration of pointers and address arithmetic is one of the strengths of the language.

Operators and expressions: C has a rich set of operators compared to most conventional languages. Besides the usual arithmetic operators +, -, *, /, and % (remainder), several other groups deserve special mention.

First, C provides operators for manipulating bits within a word (see Table 1).

&       bitwise AND
|       bitwise OR
^       bitwise exclusive-OR 
~       one's complement 
<< left shift
>> right shift 
Table 1: The C operators for manipulating bits within a word; these are necessary for many system-programming applications.

For example, the function in Listing 1 counts the 1-bits in its argument by repeatedly testing the rightmost bit, then shifting the argument one position to the right until it becomes 0. The declaration unsigned means that n will be treated as a logical quantity, not an arithmetic one.

bitcount(n)        /* count 1 bits in n */
   unsigned int n;
   int b;
   for (b = 0; n != 0; n >>= 1)
   if (n & 1)
   return b;

Listing 1: The bitcountfunction counts the 1-bits in its argument by repeatedly testing the rightmost bit, then shifting the argument one position to the right until it becomes 0.

The function bitcount illustrates a second group of operators. Any operator such as >> that takes two operands has a corresponding "assignment operator," such as >>= , so that the statement

v = v >> expr

can be written more concisely as

v >>= expr

This notation is easier to read, particularly when v is a complicated expression instead of a single-letter variable.

A third group of operators deals with logical conditions. The operators && and || are evaluated left-to-right, and evaluation stops as soon as the value of the expression is known. In a construction like

if (i < N && x[i] > 0)...

if i is greater than or equal to N (which is presumably the size of the array x), then the test involving x[i] will not be made. This behavior of logical operators is called "short-circuit evaluation."

Functions: The overall structure of a C program is a set of declarations of variables and functions. These definitions are often kept in separate files if the program is large; you can compile them separately and link them together with a linking loader.

Within a function, variables are normally "automatic"-that is, they appear when the function is entered and may disappear when it is left, as in the bitcount. However, if you declare a variable as static, it retains its value from to the next. Variables declared outside of any function are global; they can be referred to anywhere in the program.

Functions are recursive; the standard (and hackneyed) example is the factorial function (see Listing 2).

fact (n)      /* returns n! (n >= 0) */
   int n;
   if (n <= 0)
      return 1;
   return n * fact(n-1);

Listing 2: The classic example of a recursive function — the factorial function-written in c.

The arguments to a function are passed by value, which means that the function receives a copy of the argument, not the original object. (Notice that the function bitcount modified its argument; this is safe because it's actually a copy.) You can always obtain the effect of call by reference when necessary by passing a pointer to the object. Function arguments and return values can be any of the basic types — pointers, structures, or unions. To pass an array, you pass a pointer to its first element.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.