Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Security

String-Based Attacks Demystified


June, 2004: String-Based Attacks Demystified

Long strings and escape characters usher attacks

Herbert is Director of Security Technology at Security Innovation and James is a professor of computer science at the Florida Institute of Technology. They are also the coauthors of How To Break Software Security (Addison-Wesley, 2003). They can be contacted at hthompsonsisecure.com and jwse.fit.edu, respectively.


In April, we quizzed you on some basic questions about software security, which filled our e-mailboxes with questions about buffer overruns. "Buffer overruns are dead," quipped one reader. Another asked about other attacks that dealt with input strings. So we decided to address these seemingly universal concerns.

String input is an issue for software security. The problem is that developers often trust string input without checking for validity. That's foolish.

For their part, software testers often do not understand which strings can cause problems and, therefore, fail to test those strings before the software is released. That's unfortunate.

So what strings are interesting? Here are a few to watch out for.

Long Strings

When a character array is declared, it is assigned a series of memory locations based on the size it is given. For example, char mystring[256] is a string data type of 256 bytes. When a string is entered for this variable, it is placed into the memory location (a buffer) set aside for mystring. Developers, though, are responsible for writing code that ensures the string never becomes longer than 256 characters. You do this (at least, we sincerely hope you do) by truncating the input either at the interface where it is entered or before it gets written to memory.

What happens when you fail to truncate overlong strings? Well, the operating system doesn't care—it writes the first 256 bytes where they belong, and any more bytes are written in adjacent memory locations, obviously overwriting whatever was there before. The term that is generally accepted for this is "overflow." Thus, we have the event of overflowing the buffer, commonly known as the "buffer overflow."

Buffer overflows can result in any number of odd behaviors. Sometimes the tail-end of a long string is written over other memory locations reserved for data, causing type mismatch errors and invalid computation. This can crash an application or cause it to produce wrong answers. Exploitable behavior occurs when the tail-end of the string overwrites memory set aside for code.

There are three places in memory that are set aside for code:

  • The stack, which is memory containing pointers to the next instruction to be executed and its parameters.
  • The heap, which contains data that is often transferred to the stack.
  • The code segment, which is the address space that stores instructions from the binary that are awaiting their turn to be executed.

Overflowing onto memory locations that are (or could be) executed creates the potential for parts of an input string to be interpreted as an executable instruction. It is as if the person supplying input to your program can choose what instructions your program can execute. This is obviously a very bad thing.

But what interfaces to an application are susceptible to such rogue input strings? Obviously, the user interface is not particularly interesting. If someone has control of the keyboard, then they are unlikely to need a buffer overflow to do damage to your computer. After all, they already possess local access.

In fact, there are three interfaces you should check for long string input:

  • The network interface. If your application processes data that comes from a remote source via some communication protocol, then that input stream needs to be carefully checked for long strings. The Nimda, Code Red, and Slammer worms were all enabled over a network interface by developers who failed to constrain strings embedded in the protocols used to communicate with remote machines.
  • The filesystem interface. Files can act as proxies for remote users. Many viruses have spread through executable files and script files, but surprisingly few have taken advantage of a buffer overflow in the application used to create them. Our own testing has found that an amazing number of widely used applications are susceptible to buffer overflows in the files they read. Bugtraq is loaded with examples as are many other public vulnerability databases. The reason that buffer overflows are so important here is that it would be difficult to identify a file as potentially malicious until it's too late.
  • Long strings can be delivered to an application through its programmatic interfaces. API calls to the application can contain long strings in parameters; thus, these arguments must be validated by the application under test. Also, API calls made by your application can contain long strings through in parameters, return values, and data pointed to by parameters. All such data needs to be validated before it is used.

Escape Characters

There's more to string vulnerabilities than buffer overflows, though. Often, you can force an application to change how it behaves by inserting special characters or commands into input strings. There are many examples of this in practice and some common vulnerabilities like cross-site scripting, embedded commands, and SQL injection are the result of user-supplied data being used before it is validated. One of the most interesting—and arguably most dangerous—instances of this is SQL injection.

Applications that operate on the Web need to interact with a database in some form to persistently store data. The way web-application developers typically accomplish this is by constructing queries that manipulate or retrieve data stored in the database. The dominant language that these database queries are written in is SQL.

By embedding certain escape characters, attackers may be able to manipulate the query and tamper with data, retrieve sensitive database information, or take control of the database server entirely. The key problem here is that user input might be unvalidated on the server that processes the query. Many web developers rely on client-side validation to protect against escape characters being entered by users.

Typically, a web form is validated using JavaScript where characters such as single quotes are either stripped or replaced using client-side routines before the form is passed to the server. Listing One is a typical example of client-side validation in JavaScript. In this example, if illegal characters are included in either of the input fields, users are presented with a warning and forced to replace the offending characters. The critical flaw is that this validation against SQL injection is done on the client. Malicious users can easily circumvent such protection by saving the web page and commenting out the call to the validation routine. Attackers would then need to include the full URL to the page that processes this data and simply reload the saved page in a browser. The modified HTML source follows (note the missing "onsubmit"):

<FORM name="form1" action=
"http://www.si-hackedbank.com/process.asp">

One of an attacker's first clues that a web site is vulnerable to SQL injection is an error message, like that in Figure 1. Once attackers see a screen such as this, they know it is likely that a SQL injection vulnerability exists. From this error message, attackers gain a significant amount of information:

  • On the client or the server, there is some string validation or error-handling routine that is either missing or incorrectly written. Conclusion: This weakness can be exploited to manipulate the SQL queries being made on the server.
  • ODBC errors can reveal too much information about the query string that is being executed on the server. By changing input to the form, we can eventually discover most or all of the query string on the server. The query string is likely to include key table names and field names, which can then be leveraged to mount more complex SQL injection attacks.
  • Such errors reveal which database server is running on the back end, thus making an attacker's job easier by narrowing the techniques he/she uses. In this case, we see that Microsoft's SQL server is processing user data.

Consider the web page in Figure 2 that demonstrates a typical username and password field. Entering the string "'and" into the username field results in the error message in Figure 1. There are several ways attackers could proceed. By default, SQL Server 2000 allows the execution of many stored procedures; therefore, one option is to execute one of these by appending an additional command to a SQL statement. By entering the following string in the Account field, users could potentially do a directory listing of the C: drive and save it to a text file in a typically accessible location.

'; EXEC master.dbo.xp_cmdshell 'cmd.exe /c dir c:\ >
c:\inetpub\wwwroot\dir_c.txt'--

All that would be required to view this output is to browse to the file http://www.si-hackedbank.com/dir_c.txt. Obviously, much more destructive commands could be executed here as well, such as the deletion of files or the launching of a remote shell.

If stored procedures are not enabled or if the target were a different database server, then the next step would be to do some reconnaissance on the tables/fields of interest. If a legitimate username/pin is entered, then all of the records are displayed from that user (see Figure 3).

Now, instead, we want this page to return some data from the sysobjects table that, in SQL Server, contains information about other tables in the database (note that other database servers have similar tables). The names of the tables are in the name column of the sysobjects table.

Attackers can reason that a SELECT statement is being used to retrieve the account data from the database. Listing Two presents the actual query. To execute our new query on sysobjects, you must use the UNION command to join two SELECT commands—the one present on the page originally and our new query on sysobjects. To use the UNION operator, you have to select the same number and type of elements in each query. For example, if you enter the string:

' union all select name,0,0,0,0 from sysobjects--

into the Account field, the error message in Figure 4 is returned, indicating that our new SELECT statement does not have the same number of parameters as the original SELECT statement. Through trial-and-error, you can determine both the number and type of parameters in the original query. There are many things attackers may be interested in. One of the most likely is a list of the tables and whether they were created by the user. The name column of sysobjects contains table names, and the xtype column reveals who created the tables. In the running example, we can list the tables using the following input in the Account field:

' union all select 0,name,xtype,0,0,0 from sysobjects--

The result (Figure 5) is a list of all tables and their corresponding types. Of particular interest is the Records table because it is the only one created by a user (indicated by an xtype of U).

By applying techniques similar to this, attackers could view all data contained in the database. Using the INSERT and UPDATE commands, attackers can also make alterations to the data.

String attacks in general can be very formidable. The key is to always validate data that is coming from an external source before using it. This includes information directly entered by the user and data from other external entities like the filesystem or API parameters.

The Answers to April's Questions

Now that we've talked about strings, here are the answers to the questions we posed in April's installment:

Q: Can you name at least three web sites where security bulletins are published?
A: To be a conscientious developer or an effective security tester, you need to understand how other software has failed. One of the best ways to do this is to read about vulnerabilities in other released applications. Here are some good sites that list software vulnerabilities: http://www.securityfocus.com/, an excellent site now owned by the folks at Symantec; http://www.cert.org/, the Computer Emergency Response Team (CERT) is a federally funded center at Carnegie Mellon University; and http://cve.mitre.org/, the Common Vulnerabilities and Exposures (CVE) site is hosted by the MITRE Corporation and funded by the U.S. Department of Homeland Security.

Q: Can you name three security tools for network penetration?
A: There are a lot of commercial network penetration tools that are now on the market to help systems administrators and software testers keep their networks and applications secure. There is also a wealth of free tools available on the Internet. Some of the more well known free tools are: SATAN, one of the oldest and most famous network scanning tools is SATAN, the Security Administrator Tool for Analyzing Networks (http://www.fish.com/satan/); Nessus, one of the more powerful and modern free network vulnerability scanners (http://www.nessus.org/); Microsoft's Baseline Security Analyzer, MBSA runs on Windows Server 2003, Windows 2000, and Windows XP systems and scans for common security misconfigurations in many Microsoft platforms and applications (http://www.microsoft.com/). Nmap is less of a vulnerability scanner and more of a network mapper that is very good at—among other things—fingerprinting an operating system. It is one of the best tools in its class and it supports multiple platforms (http://www.insecure.org/nmap/).

Q: Can you name three tools that help you secure your application?
A: There are tools available to developers to help in fortifying an application. Instead of naming particular tools, we'll list some major classes: Source scanning tools can help you weed out common coding errors that lead to security vulnerabilities. The majority of these tools are for the C and C++ programming languages. Compile-time protection tools exist to help instrument your code to fortify the resulting application. The purposes of these tools range from buffer overflow protection to antidebugging and antireverse engineering. Add-on protection software can be used to protect an already compiled binary. The typical tools in this category guard against runtime application inspection and many leave the binary encrypted until runtime (attempting to foil static-analysis reverse engineering attacks).

Q: Name three places in memory that an exploitable buffer overrun can occur.
A: The most well-known and the most exploited buffer overflows are in the stack. Many buffer overruns in the heap, however, can be exploited using slightly different techniques. The third is the code segment that is the address space for storing instructions from the binary that are awaiting their turn to be executed

Q: Name three functions prone to the buffer overflow condition.
A: Arguably, the three most common C and C++ functions that can result in a buffer overflow vulnerability are strcopy, strcat, and sprintf. These functions should be replaced with the safer strncopy, strncat, and snprintf, respectively.

Q: List three remote entry points to an application to test for the presence of a buffer overflow.
A: Actually, there are several depending on your definition of remote. Certainly, the first are ports that your application has opened and receives data on. These are the most direct routes to your application by a remote attacker. Next are other software components that your application interacts with that may be network enabled. Here, the entry points are through API calls made to your application and the return values and parameters from API calls made by your application. Another interface that is often neglected during testing is the filesystem. Tests must be carried out to ensure that your application is resilient to file-based attacks.

DDJ



Listing One

<HTML>
<SCRIPT>
checkval=new RegExp("[\-\'\;]");
function validate(){
    if (checkval.test(form1.Acct.value)){ 
        alert("Account names and passwords should only
            contain numbers and letters");
        event.returnValue=false;
    }
    if (checkval.test(form1.Pin.value)){ 
        alert("Account names and passwords should only
            contain numbers and letters");
        event.returnValue=false;
    }
}
  ...
<FORM name="form1" action="process.asp" method="post" onsubmit="validate();">
 ...
<TD>Account</TD>
<TD><INPUT type="text" name="Acct" size="20"></TD>
</TR><TR>
<TD>Pin #</TD>
<TD><INPUT type="password" name="Pin" size="20"></TD>
 ...
</FORM>
</HTML>
Back to article


Listing Two
<% LANGUAGE = VBScript %>
<% Option Explicit  %>
<%
 ...
    QueryName = "SELECT * FROM Records WHERE Username = '" 
    QueryName = QueryName & Request.Form("Acct") & "' and   Pin = '" & 
    Request.Form("Pin") & "'"       
    Set oRs = oConn.Execute(QueryName)
%> Your Records</p>
<TABLE border = 1>
<%  
    Do while (Not oRs.eof) %>
        <tr>
        <% For Index=0 to (oRs.fields.count-1) %>
        <TD VAlign=top><% = oRs(Index)%> </TD>
        <% Next %>
        </tr>
    <% oRs.MoveNext 
Loop 
%>
 ...
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.