Dr. Dobb's | Requirements for Effective Fuzzing

Requirements for Effective Fuzzing

Testing security software

June 29, 2007
URL:http://www.drdobbs.com/security/requirements-for-effective-fuzzing/200001745

Michael Sutton is the Security Evangelist for SPI Dynamics. Adam Greene is an engineer for a large financial news company based in New York City. Pedram Amini currently leads the security research and product security assessment team at TippingPoint. This article was excerpted from their book Fuzzing: Brute Force Vulnerability Discovery. ISBN: 0321446119. Copyright (c) 2007 Addison-Wesley Professional. All rights reserved.

You teach a child to read, and he or her will be able to pass a literacy test.

George W. Bush, Townsend, TN, February 21, 2001

Fuzzing has evolved into one of today's most effective approaches to test software security. To "fuzz," you attach a program's inputs to a source of random data, then systematically identify the failures that arise.

An obvious requirement for a fuzzing tool is the capability to reproduce the results from both individual tests and test sequences. This is crucial for communicating test results to other persons or groups. As a fuzz tester, you should be able to provide your fuzzing tool with a list of malicious test case numbers knowing that the observed target's behavior will be exactly the same between test runs. Consider the following fictitious situation:

You are fuzzing a Web server's capability to handle malformed POST data and discover a potentially exploitable memory corruption condition when the 50th test case you sent that crashes the service. You restart the Web daemon and retransmit your last malicious payload, but nothing happens.Was the issue a fluke? Of course not: Computers are deterministic and have no notion of randomness. The issue must rely on some combination of inputs. Perhaps an earlier packet put the Web server in a state that later allowed the 50th test to trigger the memory corruption.We can't tell without further analysis and we can't narrow the possibilities down without the capability of replaying the entire test set in a methodical fashion.

Documentation of the various testing results is also a useful, if not mandatory, requirement during the information sharing phase. Given the rising trend of internationally outsourced development¹ it is frequently not possible for the security tester to walk down the hall and sit with the affected product developer. Outsourcing has become so popular even computer science students have been known to take advantage of it.² Various barriers of communication including time zone, language, and communication medium make it ever more important to bundle as much information as possible in a clear and concise form. The burden of organized documentation should not be an entirely manual effort. A good fuzzing tool will produce and store easily parsed and referenced log information.

Think about how the individual fuzzers we discuss handle reproducibility, logging, and automated documentation. Think about how you could improve on the implementation.

Reusability

On a large scale, if we are building a file format fuzzing tool, we don't want to have to rewrite the entire tool everytime we want to test a new file format. We can create some reusable features that will save us time in the future if we decide to test a different format. Sticking with our example, let's say we were motivated to construct a JPEG file format fuzzing tool to test for bugs in Microsoft Paint. Thinking ahead and knowing that we will want to reuse portions of our labor, we may decide to separate the tool set into three components as in Figure 1.

Figure 1: Fictitious file format fuzzer breakdown and overview.

A JPEG file generator is responsible for generating an endless series of mutated JPEG files. A launching front end is responsible for looping over the generated images, each time spawning Microsoft Paint with the appropriate arguments to load the next image. Finally, an error detection engine is responsible for monitoring each instance of Microsoft Paint for exceptional conditions. The separation into three components allows us to adapt our test set to other file formats with changes only to the generator. On a smaller scale, numerous building blocks should be portable between our fuzz testing projects. Consider, for example, an e-mail address. This basic string format is seen everywhere, including Simple Mail Transfer Protocol (SMTP) transactions, login screens, and the Voice over IP (VoIP) Session Initiation Protocol (SIP):

Excerpt of an SIP INVITE Transaction

49 4e 56 49 54 45 20 73 69 70 3a 72 6f 6f 74 40 INVITE sip:root@
6f 70 65 6e 72 63 65 2e 6f 72 67 20 53 49 50 2f openrce.org SIP/
32 2e 30 0d 0a 56 69 61 3a 20 53 49 50 2f 32 2e 2.0..Via: SIP/2.
30 2f 55 44 50 20 70 61 6d 69 6e 69 4c 2e 75 6e 0/UDP voip.openr

In each case, it is an interesting field to fuzz because we are certain the field will be parsed and potentially separated into various components (e.g., user and domain). If we're going to spend the time to enumerate the possible malicious representations of an e-mail address, wouldn't it be nice if we can reuse it across all of our fuzzers? Think about how you might abstract or modularize the individual fuzzers that we discuss to increase reusability.

Process State and Process Depth

For a solid grasp on the concepts of process state and process depth, let's pick an example most people are all too familiar with: ATM banking. Consider the simple state diagram shown in Figure 2.

Figure 2: Contrived ATM state diagram example.

In a typical ATM transaction, you walk up to the machine (ever so stealthily ensuring you weren't followed), insert your card, enter a PIN, follow a series of on-screen menus to select the amount of money you wish to withdraw, collect your money, and conclude your transaction. This same concept of state and state transitions applies to software; we'll give a specific example in a minute. Each step of the ATM transaction process can be referred to as a state. We define process state as the specific state a target process is in at any given time. Actions, such as inserting a card or selecting a withdrawal amount, transition you from one state to another. How far along you are in the process is referred to as process depth. So, for example, specifying a withdrawal amount happens at a greater depth than entering a PIN.

As a more security-relevant example consider a secure shell (SSH) server. Prior to connecting to the server it is in the initial state. During the authentication process, the server is in the authentication state. Once the server has successfully authenticated a user it is in the authenticated state.

Process depth is a specific measure of the number of "forward" steps required to reach a specific state. Following our SSH server example, consider the state diagram depicted in Figure 3.

Figure 3: SSH server state diagram

The authenticated state is "deeper" in the process than the authentication state because the authentication state is a required substep of the authenticated state. The notion of process state and process depth is an important concept that can create significant complication in fuzzer design. The following example demonstrates such a complication. To fuzz the e-mail address argument of the MAIL FROM verb in an SMTP server, we have to connect to the server and issue either a HELO or EHLO command. As shown in Figure 4, the underlying SMTP implementation may handle the processing of the MAIL FROM command with the same function regardless of what initiation command was used.

Figure 4: SMTP example state diagram 1

In Figure 4, function one is the only function defined to handle MAIL FROM data. Alternatively, as shown in Figure 5, the SMTP implementation might have two separate routines for handling MAIL FROM data depending on the chosen initiation command.

Figure 5: SMTP example state diagram 2

This is actually a real-world example. On September 7, 2006, a security advisory³ detailing a remotely exploitable stack overflow in the SMTP server bundled with Ipswitch Collaboration Suite was published. The overflow occurs when long strings are supplied between the characters @ and : during the parsing of e-mail addresses. The vulnerable parsing routine is only reachable, however, when the connecting client begins the conversation with EHLO. When building fuzzers, be mindful of potential logic splits like this. To get complete coverage, our fuzzer will have to run through all of its e-mail address mutations twice, once through EHLO and once through HELO. What happens if there is another logic split further down the process depth path? The number of necessary iterations for complete coverage starts to increase exponentially.

Tracking, Code Coverage, and Metrics

Code coverage is a term referring to the amount of process state a fuzzer induces a target's process to reach and execute. At the time of writing, we are currently unaware of any publicly or commercially available fuzzing technology capable of tracking and logging code coverage. This is an important concept for analysts across the board. Quality assurance (QA) teams can utilize code coverage as a metric to determine confidence in the level of testing that has taken place. If you are the QA lead for a Web server product, for example, you would probably feel more comfortable shipping your product with zero failures across 90 percent code coverage than you would with zero failures across only 25 percent code coverage. Vulnerability researchers can benefit from code coverage analysis by identifying the modifications necessary to expand their code coverage into more obscure states of their target where other eyes may not have already been.

Think about creative approaches to determining code coverage and the benefits that such an analysis might provide as we discuss various fuzzers in upcoming chapters. When fuzzing, people always ask,"How do I start?" Remember that it's equally important to ask, "When do I stop?"

Error Detection

Generating and transmitting potentially malicious traffic encompasses only half of the fuzzing battle. The other half of the battle is accurately determining when an error has occurred. At the time of writing, the majority of available fuzzers are "blind" in that they have no concept of how the target reacts to transmitted tests. Some commercial solutions interweave "ping" or keepalive checks between malicious attempts as a control to determine whether or not the target is still functional. The term ping here is loosely used to refer to any form of transaction that should generate a known good response. Other solutions exist that build on log output analysis. This could involve monitoring ASCII text logs maintained by individual applications or querying entries in system logs such as the Windows Event Viewer as shown in Figure 6.

Figure 6: Example Error log from the Microsoft Windows Event Viewer

The benefit of these approaches to error detection is that they are, for the most part, easily ported between platforms and architectures. However, these approaches are severely limited with regard to the kinds of errors they are capable of detecting. Neither of these approaches, for example, can detect the case where a critical error occurs in a Microsoft Windows application but is gracefully handled by a Structured Exception Handling⁴(SEH) routine.

The next generation in error detection is the use of lightweight debug clients to detect when an exceptional condition has occurred in a target. The one negative aspect of utilizing these types of tools is that you have to develop one for each target platform on which you are testing. For example, if you want to test three SMTP servers on Mac OS X, Microsoft Windows, and Gentoo Linux, you will likely have to develop two or possibly three different monitoring clients. Furthermore, depending on your target, it might not be possible or timely to construct a debugging client. If you are testing a hardware VoIP phone, for example, you might have to fall back to control testing or log monitoring, as hardware solutions are less conducive to debugging and might require special tools.

Looking even further ahead, the panacea of error detection lies in dynamic binary instrumentation/translation⁵ (DBI) platforms such as Valgrind and Dynamo Rio. On such platforms, it becomes possible to detect errors as they develop rather than after they trigger. At a 50,000-foot view, DBI-based debugging engines are able to very specifically analyze and instrument a target software application at a low level. Such control allows for the creation of memory leak checks, buffer overflow and underrun checks, and so on. Referring to the memory corruption example we used when discussing reproducibility, a lightweight debug client is capable of informing us when the memory corruption triggers. If you recall our example, we posed a scenario whereby a number of packets were sent to the target service, which crashed on receiving the 50th test case. On a platform such as Valgrind, we might be able to detect the initial memory corruption that occurred at some earlier test prior to triggering the exception. This approach can save hours and perhaps days of fuzz tuning and bug tracking.

Various nontechnical factors such as budget and deadlines can impose limitations on fuzz testing. These factors must be kept in mind during the design and planning stage. You could, for example, find yourself in a last-minute, prelaunch panic because no one has even briefly examined the security of your $50-million product investment. Security is all too often an afterthought in the software development lifecycle (SDLC), taking a backseat to adding new features and meeting production deadlines. Security must be "baked in" as opposed to being "brushed on" if we ever hope to produce secure software. That involves making fundamental changes to the SDLC to ensure that security is considered at every stage of development. That said, we recognize that software is developed in the real world and not utopia where resources are plentiful and defects are scarce. As you progress through this book it is therefore equally important to mentally classify various techniques that can be applied when time and finances are limited as well as dreaming up the "ultimate" fuzzing suite. Consider also where you would implement such tools in the SDLC and who would be responsible for owning the processes.