Dr. Dobb's | Red-Team Application Security Testing

Red-Team Application Security Testing

Red-team security testing demands focused application security testing that is independent of the development group and usually falls outside normal application-testing channels.

November 01, 2003
URL:http://www.drdobbs.com/security/red-team-application-security-testing/184405475

Nov03: Red-Team Application Security Testing

Testing techniques designed to expose security bugs

Herbert is director of security technology and Scott is director of testing technology at Security Innovation LLC. They can be contacted at [email protected], and [email protected], respectively.

Feature Scoring
Testing Techniques and Tools

Testing for software security usually means simulating attacks through the network against entire systems. This is evident by the volume of penetration testing tools that have popped up, including SATAN, SAINT, and Retina, among others. However, one of the biggest problems in network security is that intruders might exploit a buffer overflow in an application that is accessible through the network. An industry has emerged to fix these problems at the network level using hardware devices such as firewalls and intrusion-detection systems. But the truth is that, if the underlying software that runs on target systems were more secure, the need for these types of patching measures would be reduced or eliminated.

Using firewalls and testing at the network layer is not the answer. One problem with it is that network-penetration testing turns security testers into librarians who expose well-known, reemergent vulnerabilities with no hope of finding new ones. Often, we have seen so-called "penetration tests" that basically correlate to a few hundred automated scripts representing known exploits. This paradigm has become the standard not just for network security testing (where it is arguably more effective), but for application security testing as well. To thoroughly test applications for security though, you need to test like detectives not librarians. In this article, we describe a methodology for finding the underlying causes of these vulnerabilities—bugs in software. This method helps organize application-penetration testing through decomposition of an application, ranking of features for potential vulnerabilities, and allocation of resources.

The Security Testing Problem

Why do you need application security testing? Isn't it covered by functional testing, specification-based testing, regression testing, and all the other types of standard verification procedures that software-development organizations use? Unfortunately, the answer is a resounding "no!" We realized this several years ago when security bugs came into the limelight. We found that the underlying flaws in software that let attackers exploit applications or networks were rarely flaws that violated some requirement or rule in the specification. Instead, the flaw turned out to be some side effect of normal application behavior. Consider, for instance, a notorious bug in the Pine 4.3 e-mail client for Linux. Under certain configurations, Pine 4.3 creates a temporary file for messages being edited through its user interface in a file in the /tmp directory, which is globally accessible (see http://www.securityfocus.com/archive/1/150150/ for details). This means that attackers could read any message from any user on the system while it was being composed. Is this a bug? Well, certainly it's a security issue of the highest severity, but it doesn't fit the model of a traditional functional bug. Mail could be successfully composed and sent, and such test cases were likely executed thousands of times. The side effect of writing out to temporary, unprotected storage just wasn't noticed by testers and developers. For more information on the side-effect nature of security vulnerabilities, see "Testing for Software Security" (DDJ, November 2002) or How to Break Software Security: Effective Techniques for Security Testing, by James Whittaker and Herbert H. Thompson (Addison-Wesley, 2003). The hidden nature of most security bugs is the reason applications need specific, focused security testing. This is testing that defies the traditional model of verifying the specification and, instead, hunts down unspecified, insecure side effects of "correct" application functionality.

"Red-teaming," "penetration testing," and "security testing" are all terms that express the same basic idea—short, focused, intense security testing of applications. This testing is independent of the development group and usually falls outside of normal application-testing channels—that's the point; it's independent. Red-teaming lets testers attack an application in ways an intruder is likely to. But this still isn't effective enough. An application opens itself up to potentially thousands of man hours worth of attacker effort once it is released. Security testers must work more efficiently and with greater accuracy than intruders do in order to have any hope of catching the majority of security defects in an application. This is what red-teaming is all about and why the need for it is so acute.

The Methodology

One of the key needs in creating a short, focused security assessment of applications is to quickly identify which areas of the software are most likely to be vulnerable. For this, we decompose an application into features and score these features for insecurity. During this process, we show how to identify the testing strategies and attacks that are likely to be bug-revealing for that feature. From this information, we develop a plan and assign people to roles: people to investigate components, people to execute tests, and people to develop or acquire tools. This feature-based testing lets us draw conclusions about component strengths and weaknesses in very specific terms that give developers the information they need to fix the problems. This model (see Figure 1) has been used to successfully conduct penetration tests for large software companies and find vulnerabilities that have stopped shipment on many sizeable commercial products.

Decomposition of the application means partitioning the application's features into manageable testing areas. The method of partition can vary, but ideally it is guided by two questions:

Is this feature of manageable size for a single individual, operating alone or with a small team, to explore its functionality and conduct tests in a relatively short time?
Does this feature form a natural partition in that most functionality is contained within this feature and there are few interfaces between it and the rest of the application?

Imagine, for example, a music player that plays both streaming media from the Web and files stored either locally or on remote machines. One simple partition of the application may be:

Reading of files from the local filesystem.
Communication through the network with streaming media.
The GUI.
Storing of favorites and other user-preference data.

There could be other possible divisions of the application. For large applications, there are likely to be dozens of features. To cope with this, you must then decide how to allocate testing resources to these features. There are several criteria you could use based on the number of inputs, proportion of users that are likely to use that feature, or lines of code. For functional testing, these are certainly reasonable criteria since the goal would be coverage and the likelihood that users encounter the bug (that is, use that feature). For security testing, your reasoning is different. Since the focus is short, intense testing, you should allocate more resources to the components that are more likely to contain vulnerabilities. See the accompanying text box entitled "Feature Scoring" for more information.

Once features are scored, they are assigned to testers who manage the evaluation of the components. Testers have two primary responsibilities at the onset of component testing:

Determine what tools are necessary or would be helpful in executing tests. Requirements for these tools are then passed on to developers within the test organization who search for a low cost or free tool that can be used; if such tools cannot be located, they develop the tool. Bear in mind these are small, focused, special-purpose tools likely to have a short development time.
Identify testing techniques that would be useful in exposing vulnerabilities in the component (see the text box entitled "Testing Techniques and Tools").

As more is understood about the component during the test-execution process, there may be changing requirements for testing tools. For this reason, test developers and testers work hand-in-hand to produce new tools as needed. When vulnerabilities are found, problem reports are created that send information including reproduction steps, hardware configuration, operating-system details, tools needed to reproduce the failure, and any other relevant information to the stakeholders in the testing effort (the internal product development group). Once the project is over, these reports form the basis for postmortem bug evaluations.

Postmortems

Bugs are corporate assets. There is no better way to understand what your organization is doing wrong than to thoughtfully analyze bugs that escaped the normal development and testing processes through a postmortem evaluation. This analysis helps refine the testing process so that those types of bugs are found sooner in future security-testing endeavors. Postmortems are best done soon after the security-testing project has ended, when bugs are still fresh in the minds of the testers who found them.

Conclusion

Ideally, development and testing practices in the industry will move to accommodate the need for security-aware measures. Until then, red-teaming is perhaps the best practice to use.

Acknowledgment

Thanks to Matthew Oertle of Security Innovation for providing code excerpts of our in-house network-corruption tool.

DDJ

Listing One

/* Network Corruption excerpt 
  By Matthew Oertle
  This is the callback function for libpcap <http://www.tcpdump.org>
  u_char *data is a pointer to the incoming packet
*/
void Callback( u_char *user, const struct pcap_pkthdr *header, 
                                                     const u_char *data ) {
    // Structures for packet fields
    EthHdr ethOut;
    IpHdr  ipOut;
    TcpHdr tcpOut;
    offset = 0;
    ethOut = (EthHdr)data;
    offset += ETH_H;

    // Take care of Layer 2 addressing
    memcpy(ethOut->src_mac, externalMAC, 6);

    // Look at IP packets
    if(ethOut->protocol == 0x0800) {
        ipOut = (IpHdr)(data + offset);
        offset += ipOut->hlen * 4;

        // Look at TCP packets
        if(ipOut->protocol == 0x06) {
            tcpOut = (TcpHdr)(data + offset);
            offset += tcpOut->hlen * 4;
            // Check if it is the port we are interested in
            if(tcpOut->dest_port == TEST_PORT) {
                // Call the corruption function
                corrupt_payload(data + offset, data_len - offset);
                // Re-compute the checksum
            }
        }
    }
    // Inject the modified packet onto the wire
    libnet_write_link_layer(iface, device, data, data_len);
}

Back to Article

Listing Two

/* This function takes a pointer to the packet data and the length hi and lo 
are global functions that initialized to 0xff and 0x00. The function corrupts 
a single byte each time the match string is found in the packet
*/
int corrupt_payload(u_char *data, int len) {
    if(memmem(data, len, match, match_len)) {
        data[lo] = hi;
        hi--;
        if(hi == 0xff) {
            lo++;
        }
    }
   return len;
}

Back to Article

Nov03: Red-Team Application Security Testing

Figure 1: The red-teaming process.

Nov03: Feature Scoring

Feature Scoring

To allocate testing resources effectively, applications must be broken up into features. Looking at the number of features in many modern software applications, it's clear that all features cannot be tested in a finite amount of time. Therefore, to hone in on features most likely to contain security vulnerabilities, we use a scoring mechanism that assigns weights to features to help allocate testing resources. Application scoring is not new. Michael Howard and David LeBlanc present an excellent feature-scoring scheme based on source-code access in their book Writing Secure Code, Second Edition (Microsoft Press, 2002). Our scoring has the requirement that features must be assigned values quickly—without access to, or analysis of, the application's source. In our approach, testers simply mark-off against a list of properties that may have security implications.

Is it a security feature? [+4]
Is it a new feature? [+3]
Does it use the network? [+3]
Does it use permissions? [+2]
Does it run as a privileged user? [+1]
Are there known vulnerabilities in previous versions? [+1]
Does it use a Database Management System (DBMS) such as SQL? [+2]
Does the feature use structured data? [+2]
Are there severe consequences of failure of this component to the rest of the application? [+3]
Does this feature use concepts of persistent state? [+3]
Are there any dynamic conversions? [+2]
Are certain categories of inputs/actions blocked (as malicious)? [+2]

The weightings have been derived through iterative refinement on multiple testing projects. The score an application receives as a result is highly correlated to the number and severity of security vulnerabilities discovered, not just during our testing, but also by the user community after software release.

—H.H.T. and S.G.C.

Nov03: Testing Techniques and Tools

Testing Techniques and Tools

We studied thousands of postrelease security bugs from sources such as CERT (http://www.cert.org/) and bugtraq (http://www.securityfocus.com/) and asked three primary questions about each bug:

What underlying software fault or design decision caused this bug?
What were the symptoms of failure that should have alerted the tester to the presence of the bug?
What test-design strategy would have enabled testers to expose the vulnerability before the software was released?

Our method was to analyze these bugs, then generalize the patterns of behavior and invent potentially useful test techniques. The result was How to Break Software Security: Effective Techniques for Security Testing, by James Whittaker and Herbert H. Thompson (Addison-Wesley, 2003), which describes focused testing techniques to expose security bugs. In this study, we identified four main classes of security bugs.

Dependency failures. Many security bugs are not caused directly by the application under test; rather, they are inherited from libraries, files, and other resources external to the application. Either the application depends on an external resource to provide security and that resource becomes unavailable, or the security provided by a resource is faulty. Three key resources that cause many of these types of vulnerabilities are the file system, the registry, and dynamically loaded code libraries. Testers must ask when, where, and how the application access data stored in the registry and file system, and whether this is appropriate based on the sensitivity of the data stored there. For libraries, we must ensure that the application responds securely to library failures.

One good example of a dependency failure is the corruption of network data. Applications may be able to securely handle complete network failure, but may not have anticipated a user manipulating data contained in the payload of a packet manually. One reason for this is that applications communicating over the network often assume that remote machines sending packets will send the packets in the proper form and constrain the length and value of certain data. Listing One shows the structure of a basic network-packet corrupter that intercepts and manipulates packet-payload data on the wire. Listing Two shows a specific corruption function that searches the payload of a packet for a string, and if that packet contains that string, manipulates some of the payload data.

Unanticipated user input. Unanticipated input scenarios are a second major source of security bugs. Applications are normally programmed to deflect illegal input, but considering all such possibilities is an enormous task. The best example of this is the notorious buffer overrun. Buffer overruns are caused by entering long strings into input fields; the strings must be longer than the internal buffer assigned to hold them. In certain situations, the overflow can actually cause arbitrary (and sometimes malicious) code to be executed by the host computer.

Design vulnerabilities. A third source of security vulnerabilities are those that have been designed into the software inadvertently. Obviously, no ethical developer would purposefully design insecure code but often legacy code or improper assumptions about user behavior will cause design decisions to be made that are not in the best interest of security.

At the top of the list are flaws that allow easy access to the application. If the application ships with user accounts, make sure that all administrative or root accounts force the users to change the default password. Make sure that the default network port configuration is minimal; that is, that no ports are left open without security mechanisms in place. Make sure that the product ships with any security settings tuned to maximum.

Implementation vulnerabilities. Even perfect designs can be made insecure through imperfect implementation. Security can be spelled out meticulously in a specification and yet be implemented in such a way as to cause insecurity. The best example of this is the so-called "man in the middle" attack, which takes advantage of good security that is implemented improperly.

The attack takes advantage of time discrepancies between the code that implements security around some piece of sensitive information and the code that actually uses that same information.

In studying security bugs, we found that we needed a tool that lets us both observe all the application's interactions with its environment—the registry, filesystem, kernel, and other applications—and also control these applications. The result of our efforts is a freely available tool called "Holodeck," which is available from DDJ (see "Resource Center," page 5) or at http://www.sisecure.com/.

The hacking and security communities also offer a wealth of tools for security testers. Two of the best sites we've found are http://astalavista.box.sk/ and http://www.sysinternals.com/.

—H.H.T. and S.G.C.