Go Parallel
The New C Standard Explored

C11 specifies many security features that require minimal changes to existing code. They greatly reduce unexpected behavior and prevent many kinds of common attacks.

By Tom Plum
May 08, 2012
URL : http://www.drdobbs.com/cpp/the-new-c-standard-explored/232901670

C and C++ are members of the same family of languages. The evolutionary boldness of C++ removes some of the marketplace pressure on C; people who are continually pushing for innovation are naturally drawn to the C++ development process. Each language had a coherent original design (by Dennis Ritchie and Bjarne Stroustrup, respectively), followed by successive refinement in a very competitive marketplace of ideas. Both languages share an extreme concern for performance, with the slogan "don't leave space for a more-efficient systems-programming language underneath our language (C or C++)." However, it's unfair to complain that the original designs assigned too little importance to cybersecurity; both languages pre-date the beginnings of concern for security. But in recent years, as the marketplace has started to emphasize cybersecurity, C and C++ have been responding in several ways.

In early 2002, Bill Gates' famous "battleship-turning" memo made cybersecurity a top goal for Microsoft. About a year later, Microsoft proposed a new "bounds-checking" library to WG14, which eventually became Technical Report 24731-1. It now is part of C11 as the (optional) Annex K. (An almost-final draft of C11 is available here [PDF].)

The C11 Annex K Functions

I'll start my tour of Annex K with the fopen_s function. The main innovation is that files are opened with exclusive (also known as non-shared) access. Furthermore, if the mode string doesn't begin with u (such as with code being updated from the older fopen ), then to the extent that the underlying system supports it, the file gets a file permission that prevents other users on the system from accessing the file.

In this article, I'll sequentially enumerate the security benefits of these _s functions. The new semantics illustrate the pattern of "least privilege." This "exclusive" mode was previously available in the Posix open() function, but the ISO standard for C doesn't standardize system-dependent, low-level I/O. See Robert Seacord's book Secure Coding in C and C++ for detailed discussion of these various security benefits of the Annex K library.

fopen() safety

If a file is opened with x as the last character in the mode argument, and the requested filename is already in use, the fopen_s function fails (as opposed to truncating the existing file, which is presumably already being used by someone). If the application program had been required first to check whether the file was in use and then to create the new file, this would illustrate the "time-of-check versus time-of-use" vulnerability (TOCTOU). The Annex K document aims to minimize the TOCTOU.

The mode argument is passed to fopen_s as a const char* pointer, as is the filename argument. Requiring these pointers to be non-null is one of the "runtime-constraints" of the fopen_s function, to use the C11 terminology.

If any of the runtime-constraints is violated, the library function (here, fopen_s) invokes the run time-constraint handler. (In Visual Studio, this handler is known as the invalid parameter handler — same concept, different name.)

This approach is a new pattern of response to security issues: Invoke the runtime-constraint handler if any runtime-constraint is violated. Previously, a runtime-constraint violation would have resulted in an undefined behavior if not caught.

If the runtime-constraints are not violated, then fopen_s returns the resulting FILE* pointer through an argument, rather than producing it as the returned value of the function. If fopen_s fails for any of several reasons, it returns a nonzero value according to the conventions encoded in <errno.h>. The various Annex K headers provide the typedef name errno_t for this returned int value. This reduces the inconsistency of return-value idioms to the greatest extent possible, by uniformly returning errno_t for erroneous conditions that did not violate a runtime-constraint.

This initial discussion about fopen_s has introduced the first four patterns of the Annex K library:

In the original C standard, and still in C++ today, most library functions specify something like "if copying takes place between objects that overlap, the behavior is undefined." In C99 and C11, there is a syntactic way to specify this restriction, the restrict keyword. As a result of all these various design decisions, the calling sequence for secure fopen_s calls looks like this:

errno_t fopen_s(FILE * restrict * restrict streamptr,
const char * restrict filename,
const char * restrict mode);

Designing the runtime-constraint handler provides the implementation and the project team a range of choices. The simplest handler simply invokes abort(). A somewhat more complex architecture gives the user a choice between aborting or debugging, potentially preserving the full state of the stack frames and global variables. Other handlers could be used: In an application that never terminates, the handler could reinitialize, flush the current transaction, start a new transaction, and so forth. In a specialized testing situation, the handler could log the failures.

The freopen_s function illustrates the same patterns as fopen_s, including the x and u mode flags.

Fixing tmpnam()

Continuing with the file-oriented functions, consider tmpnam_s:

errno_t tmpnam_s(char *s, rsize_t maxsize);

This function illustrates another security pattern in C11: "In the calling sequence of the function, every pointer through which the function might modify an array is immediately followed by the number of elements which the function is permitted to modify."

In the case of tmpnam_s, the second argument specifies a maximum for the number of characters that can be modified by tmpnam_s. The type of the second argument is rsize_t, designating a "restricted size_t" value. The intent is to prevent the common error of inadvertently passing a negative value, which after conversion to an unsigned type, becomes a huge number, and in this case, defeating the purpose of bounds-checking the string written into s. This common error is intended to be caught within tmpnam_s by comparing maxsize against RSIZE_MAX and invoking the runtime-constraint handler if it's larger. (I've said "intended" several times, because Annex K makes it optional whether RSIZE_MAX is any smaller than SIZE_MAX.) This manner of designating bounding values with the type rsize_t is another security best practice promulgated in the Annex K library.

Next, consider the tmpfile_s function:

errno_t tmpfile_s(FILE * restrict * restrict streamptr);

It could be invoked like this:

FILE *myTempFile = 0;
errno_t err = tmpfile_s(&myTempFile);

There is a window of TOCTOU vulnerability between obtaining a filename from tmpnam_s and subsequently creating that file with fopen_s. Using tmpfile_s eliminates that particular vulnerability.

The %n Formatting Vulnerability

Another security pattern is "eliminate the %n format." For the details, refer to Seacord and the original Rationale for the library that became Annex K. The basic problem with %n is that the printf family of functions are intuitively thought of as "output" functions, but the %n format can be used to modify memory, and therefore provides an attack surface.

These, then, are the _s versions of the formatted output functions: fprintf_s, printf_s, snprintf_s, sprintf_s, vfprintf_s, vprintf_s, vsnprintf_s, vsprintf_s, fwprintf_s, snwprintf_s, swprintf_s, vfwprintf_s, vsnwprintf_s, vswprintf_s, vwprintf_s, wprintf_s.

The corresponding input functions implementing patterns I discussed earlier, notably handlers for null arguments, buffer sizes overflow, RSIZE_MAX, and overlapping stores. The functions are the _s versions of the formatted input functions; that is: fscanf_s, scanf_s, sscanf_s, vfscanf_s, vscanf_s, vsscanf_s, fwscanf_s, swscanf_s, vfwscanf_s, vswscanf_s, vwscanf_s, wscanf_s.

Miscellaneous Security Improvements

The Annex contains several other security measures, which I'll summarize quickly.

Bounding Date Values
The following change applies to the time-and-date functions that produce a "year" value. It should be bounded to the interval [0, 9999]; added to the other patterns, this produces the time-and-date functions: asctime_s, ctime_s, gmtime_s, localtime_s.

memset()
The standard guarantees that memset_s will over-write the argument array, even if the optimizer thinks that those stores are "useless," such as when over-writing a password before leaving a function.

Reentrancy variables
It also provides an extra argument to keep track of previous state information, to avoid static buffers that would prevent re-entrancy or use in a multi-threaded environment: bsearch_s, qsort_s, strtok_s, wcstok_s.

Chopping and Zero-Fill
Another specified pattern is to chop (or zero-fill) the resulting string if a runtime-constraint error happens: gets_s, getenv_s, wctomb_s, mbstowcs_s, wcstombs_s, memcpy_s, memmove_s, strcpy_s, strncpy_s, strcat_s, strncat_s, strerror_s, strnlen_s, wcscpy_s, wcsncpy_s, wmemcpy_s, wmemmove_s, wcscat_s, wcsncat_s, wcsnlen_s, wcrtomb_s, mbsrtowcs_s, wcsrtombs_s.

Bounds on buffer allocation
The document provides the bounds that will be needed to allocate buffers: The strerrorlen_s function tells how many characters will be needed to store the locale-specific error message for one specific errno value.

Finally, I should point out a pattern illustrated in all the functions in Annex K: They permit a localized remediation of existing code, without global design changes. Each of the various _s functions can replace its previous version by changing only one or two lines of the existing code.

The Annex K functions are widely available on Visual Studio and a few other places. Still, there's no reason why they shouldn't already be available on all platforms. Perhaps there is some degree of "not invented here" resistance; I hope this article and others will help create greater marketplace demand. Talk to your compiler/library providers.

Annex L and Undefined Behavior

There has been a tendency to approach the requirements for safety-critical, zero defects, and security with the same developmental methods, producing high-integrity applications at a correspondingly high cost. However, cybercriminal exploits tend to focus on the most popular apps, which are often produced under less-than-ideal schedule and budget constraints. The languages chosen for popular apps are frequently C and C++.

Within WG14 there have been several initiatives to improve software security without sacrificing the efficiency advantages of C or the developmental methodologies that organizations are already familiar with.

Within the world of safety-critical development methods, it is common to target the elimination of all undefined behaviors (UBs) in C, on the grounds that compilers are free to do anything whatever when an app produces UB. However, compiler developers are very influential within WG14, and they know that in almost all cases of UB, the hardware actually produces a benign result, and that often when some corner case is identified as UB, the standard is marking it as "non-portable," and not as "dangerous."

The Analyzability Annex of C11 (Annex L) identifies a small number of UBs as "critical UB," classifying all the others as "bounded UB," and imposes some implementation constraints on the resulting behavior. The net result is that when an implementation provides this analyzable behavior, and the app is subjected to static analysis, the actual app when executed does implement the source code of the program as analyzed.

With the benefit of hindsight, I have found one improvement that I will suggest for the Analyzability Annex: an implementation should be permitted by Annex L to generate code that violates the constraints in the Annex, provided that it produces a warning message when it does so. After all, "analyzability" relies on a project methodology that focuses attention upon the warnings generated by the static analyzer and the compiler, so guaranteeing the production of a warning is all that should be required by Annex L.

The New C Secure Coding Rules Project

ISO and IEC currently define a Technical Specification (TS) to have less than the official status of an International Standard (IS). WG14 has used this less-formal approach to several topic areas (including the Bounds-Checking Library TR 24731-1 mentioned above) for specifications that may benefit from experience in the marketplace before being standardized.

Further work is under way within WG14, a Technical Specification (TS 17961) for "C Secure Coding Rules" (CSCR). Most of the TSs (and ISs) produced by the programming language committees target the compiler-and-library marketplaces, but the CSCR TS primarily targets the static analyzers marketplace.

Since last month's article, C Finally Gets a New Standard, I've now learned that C++11 is available from the ANSI store: go to http://webstore.ansi.org, search for "14882," and select "INCITS/ISO/IEC 14882-2012." It costs US $30 for the PDF. The whole process takes only a few minutes. The C11 standard should be available soon; fill out the form at www.plumhall.com/stds-notify-form.html to be notified when ANSI C11 can be ordered.


Dr. Thomas Plum is Vice President of Technology and Engineering at Plum Hall, Inc., and is a member of the C and C++ committees that developed C11 and C++11. He gratefully acknowledges helpful suggestions from Robert Seacord of CERT and David Keaton, chairman of the US committee for C language.

Related Article

C Finally Gets a New Standard

C's New Ease of Use and How the Language Compares with C++

Copyright © 2012 UBM Techweb