Michael Sutton is the Security Evangelist for SPI Dynamics. Adam Greene is an engineer for a large financial news company based in New York City. Pedram Amini currently leads the security research and product security assessment team at TippingPoint. This article was excerpted from their book Fuzzing: Brute Force Vulnerability Discovery. ISBN: 0321446119. Copyright (c) 2007 Addison-Wesley Professional. All rights reserved.
You teach a child to read, and he or her will be able to pass a literacy test.
George W. Bush, Townsend, TN, February 21, 2001
Fuzzing has evolved into one of today's most effective approaches to test software security. To "fuzz," you attach a program's inputs to a source of random data, then systematically identify the failures that arise.
An obvious requirement for a fuzzing tool is the capability to reproduce the results from both individual tests and test sequences. This is crucial for communicating test results to other persons or groups. As a fuzz tester, you should be able to provide your fuzzing tool with a list of malicious test case numbers knowing that the observed target's behavior will be exactly the same between test runs. Consider the following fictitious situation:
You are fuzzing a Web server's capability to handle malformed POST data and discover a potentially exploitable memory corruption condition when the 50th test case you sent that crashes the service. You restart the Web daemon and retransmit your last malicious payload, but nothing happens.Was the issue a fluke? Of course not: Computers are deterministic and have no notion of randomness. The issue must rely on some combination of inputs. Perhaps an earlier packet put the Web server in a state that later allowed the 50th test to trigger the memory corruption.We can't tell without further analysis and we can't narrow the possibilities down without the capability of replaying the entire test set in a methodical fashion.
Documentation of the various testing results is also a useful, if not mandatory, requirement during the information sharing phase. Given the rising trend of internationally outsourced development1 it is frequently not possible for the security tester to walk down the hall and sit with the affected product developer. Outsourcing has become so popular even computer science students have been known to take advantage of it.2 Various barriers of communication including time zone, language, and communication medium make it ever more important to bundle as much information as possible in a clear and concise form. The burden of organized documentation should not be an entirely manual effort. A good fuzzing tool will produce and store easily parsed and referenced log information.
Think about how the individual fuzzers we discuss handle reproducibility, logging, and automated documentation. Think about how you could improve on the implementation.
On a large scale, if we are building a file format fuzzing tool, we don't want to have to rewrite the entire tool everytime we want to test a new file format. We can create some reusable features that will save us time in the future if we decide to test a different format. Sticking with our example, let's say we were motivated to construct a JPEG file format fuzzing tool to test for bugs in Microsoft Paint. Thinking ahead and knowing that we will want to reuse portions of our labor, we may decide to separate the tool set into three components as in Figure 1.
A JPEG file generator is responsible for generating an endless series of mutated JPEG files. A launching front end is responsible for looping over the generated images, each time spawning Microsoft Paint with the appropriate arguments to load the next image. Finally, an error detection engine is responsible for monitoring each instance of Microsoft Paint for exceptional conditions. The separation into three components allows us to adapt our test set to other file formats with changes only to the generator. On a smaller scale, numerous building blocks should be portable between our fuzz testing projects. Consider, for example, an e-mail address. This basic string format is seen everywhere, including Simple Mail Transfer Protocol (SMTP) transactions, login screens, and the Voice over IP (VoIP) Session Initiation Protocol (SIP):
Excerpt of an SIP INVITE Transaction 49 4e 56 49 54 45 20 73 69 70 3a 72 6f 6f 74 40 INVITE sip:root@ 6f 70 65 6e 72 63 65 2e 6f 72 67 20 53 49 50 2f openrce.org SIP/ 32 2e 30 0d 0a 56 69 61 3a 20 53 49 50 2f 32 2e 2.0..Via: SIP/2. 30 2f 55 44 50 20 70 61 6d 69 6e 69 4c 2e 75 6e 0/UDP voip.openr
In each case, it is an interesting field to fuzz because we are certain the field will be parsed and potentially separated into various components (e.g., user and domain). If we're going to spend the time to enumerate the possible malicious representations of an e-mail address, wouldn't it be nice if we can reuse it across all of our fuzzers? Think about how you might abstract or modularize the individual fuzzers that we discuss to increase reusability.