Herbert is Director of Security Technology at Security Innovation and James is a professor of computer science at the Florida Institute of Technology. They are also the coauthors of How To Break Software Security (Addison-Wesley, 2003). They can be contacted at hthompsonsisecure.com and jwse.fit.edu, respectively.
In March 2004, a bug was reported in Epic Games's Unreal game engine, the machine that drives such popular games as Unreal Tournament and Splinter Cell. It turns out that users could crash the server by inserting a %n character into some of the incoming packets. A couple of months earlier, a similar problem was found in the Windows FTP Server produced by FTP Server Software. It seems that with a particular character string (that contained some strategically placed %n and %s characters), you could execute arbitrary instructions on a remote host. Roll back the clock to 2000 and you find an issue was reported in Wu-Ftpd that let casual users read sensitive information from a running application by entering %x into an input string. These three applications have one thing in commonthey all had format-string vulnerabilities.
Format-string vulnerabilities happen when programmers fail to specify how user data will be formatted. Any C programmer who has typed a few semicolons is familiar with the types of functions that let this kind of thing happen. The culprits are usually members of the format-string family in C and C++, which includes the printf, sprintf, snprintf, and fprintf functions.
When most of us learned C, the first thing we did was to build a "Hello World" program that used a printf:
We then graduated to more ambitious programs, passing a name in as a command-line argument and then printing it:
int main(int argc, char *argv)
In this example, the string in quotes is a format string and the format specifier %s tells the function to read the next argument (in this case the argv, the first command-line argument) and print it as a string.
The danger with format functions is that input is often printed without a fixed format string. For instance, in the aforementioned code, you could omit the %s format string, which would change the printf statement to:
int main(int argc, char *argv)
Now, printf blindly processes data supplied by users. Using this structure, our application is open to attack through parameters entered by users. Consider, the input string a_string. Compiling the aforementioned programsPrintf_1.exe using the %s specifier and Printf_2.exe without %sand running them with our string yields:
Both applications produce the same result. If you enter the string a_string%s, however, you get this output:
The difference is that in Printf_1, you explicitly told the application to treat a_string%s as a string and, thus, it was printed as-entered. In the second case, the application used the input a_string%s as the format string and, thus, it was interpreted as the string a_string followed by the format specifier %s. When compiled, pointers to the parameters to be formatted by the printf function are placed on the stack. When Printf_2 was executed, there was not a valid address to a string on the stack, thus the %s format specifier printed (as text) whatever string occupied the memory address that happened to be at the top of the stack. Additionally, it is fairly easy to crash this application and cause a denial of service by using multiple %s specifiers, which eventually read from protected memory space or an invalid address on the stack.
Besides %s, other formatting characters exist that let attackers launch much more insidious attacks. A favorite of attackers is %x, which can be used to print a hex value at the top of the stack. Using multiple %x specifiers, you can look at the entire contents of the stack. This is a relatively simple attack to carry out and the result can be the exposure of sensitive data in memory including passwords, encryption keys, and other secrets. Figure 1 illustrates how the attack works. Users are prompted for some input and then the application prints that input in a future command. Users can read data from the stack by using multiple %x characters.
Aside from %x and %s, %n is one of the most interesting specifiers because it actually writes something to memory. Many format-string attacks make use of the %x and %n format specifiers in combination. If you use %n without passing a variable, the application attempts to write a valuethe number of bytes formatted by the format functionto the memory address stored at the top of the stack. It is this ability that may ultimately let attackers execute arbitrary commands by taking control of the application's execution path.
There are many other format specifiers you can use. Table 1 presents a list of some of the more commonly used ones.
The Format Functions
The printf function is a member of a wider class of functions that use format strings for output. Functions like sprintf and fprintf are also vulnerable to format-string attacks. Table 2 lists some other common C functions that use format strings and are vulnerable to this type of attack.
In addition to functions that directly format data, however, there are a few others such as syslog that can also process user data and have been exploited through format specifiers.
Of the functions in Tables 1 and 2, sprintf is particularly interesting from a security standpoint because it "prints" formatted data to a buffer. Aside from the possibility of a format-string vulnerability, using this particular function can lead to buffer overflow vulnerabilities and should usually be replaced with its length-checking cousin snprintf.
While people have been publicly exploiting buffer overruns since the late 1980s, format-string attacks have only been well understood since 2000. That year, the Common Vulnerabilities and Exposures database (CVE; http://cve.mitre.org/) listed over 20 major applications and platforms that had been exploited though these attacks.
Let's take a look at a specific instance of this vulnerability in a commercial application. The Windows FTP Server available from FTP Server Software (http://srv.nease.net/) is open to format-string attacks through the username parameter (see http://www.securityfocus.com/archive/1 /349255/). For example, if you enter %s as the "User," then the server crashes when it tries to interpret the value at the top of the stack as a memory address (because it attempts to read from this bogus address, as in Figure 2).
The good thing about format-string vulnerabilities is that they are relatively easy to find in a source-code audit. Any variable that contains data that is either directly or indirectly influenced by the user should be processed using a format string that dictates how that data will be interpreted. A careful analysis of code can usually find such vulnerabilities. It is important, though, to be familiar with functions that use formatted output. Table 2 is a good starting point but there are also some OS-specific functions like syslog() that must also be scrutinized.
There are also some automated source-scanning tools for C that can make the process of searching through your source code easier. RATS, the Rough Auditing Tool for Security, is a free source-code scanner produced by Secure Software (http://www.securesw.com/) that is capable of scanning C, C++, Perl, PHP, and Python source code. The ITS4 security scanner by Cigital (http://www.cigital.com/) is also free and can be used to scan C and C++ for related issues. Flawfinder (http://www.dwheeler.com/flawfinder/) is another GPL vulnerability finder that scans C and C++ code for a variety of security problems including format strings. If your focus is just finding format-string vulnerabilities, the pscan tool (http://www.striker.ottawa.on.ca/~aland/pscan/) is an open-source tool that focuses exclusively on finding format-string vulnerabilities in C code.
From a black-box-testing perspective these vulnerabilities can be unearthed by including specifiers such as %x, %s, and %n in input fields. The symptoms of failure when using a string of %xs characters is likely to be "garbage" data returned to users in messages that quote the input string. A more drastic approach is to place several %s characters into the input field. If a format-string vulnerability exists, this will cause the application to read from successive addresses at the top of the stack. Since some of the data on the stack may be the contents of other variables (like a string), trying to convert this data to a memory address and then reading from that address is likely to result in an "Access Violation" error, which will cause the application to crash.
Once they have been located, fixing format-string vulnerabilities is relatively easy: Use a fixed format string! For example, vulnerable calls are likely to look something like this:
snprintf(dest_buffer, size, user_data);
If you want user data to be displayed, processed, or saved all as a string for the above functions, that can be fixed using %s as shown here:
fprintf(stdout, "%s", user_data);
snprintf(dest_buffer, size, "%s", user_data);
With these modifications, we have told the format functions to specifically treat user input as a string. By explicitly specifying the format of user data, we can protect against application manipulation through format characters.
Beyond Format Strings
In the previous installment of this series (see "String-Based Attacks Demystified," DDJ, June 2004), we took a look at some of the ways attackers manipulate input strings to take control of software. Format-string vulnerabilities represent an important category of string vulnerabilities. The tie that binds all these problems together is an implicit trust of user data and, thus, a failure to validate such data. The solutionvalidating user input, of course! Whenever your application reads user data, think about the following: How will this data be used? What are the escape characters, strings, commands, and reserved words that may be interpreted as more than just text? We may not be able to build an impenetrable fortress, but we can at least lock the castle gate.