Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Safe Coding Practices


Gwyn, Chief Technology Officer at Klocwork, has over 20 years of global technology experience. At Klocwork, Gwyn focuses on his original passion, compiler theory, to move static source code analysis to the next level.


Security is becoming more and more critical to developers in all types of environments—even those such as embedded systems that have until recently considered security a non-issue. In this article, I examine several types of coding vulnerabilities, pointing out what the vulnerability is, how you can mitigate the risk of exploit within your code, and how to best find these types of flaws in your code.

Injection Flaws

When attempting to inject information into a running process, attackers are trying to compromise the running state of the process to reflect some end goal that is unprotected by developers. For example, attackers could be trying to inject code into the process via stack corruption, resulting in the ability to execute code of the attacker's choice. Alternatively, attackers could be trying to inject data into a database for future use, or unguarded strings into a database query to extract more information than was the original developer's intent. Injection for any purpose is a bad thing and needs careful consideration at all times.

Perhaps the most malicious form of injection attack is code injection—placing new code into the memory space of the running process and then directing the running process to execute it. Successful attacks of this type can do almost anything, as the running process is totally hijacked and compromised to perform whatever the attacker desires.

One of the most famous instances of this type of attack is the Windows animated cursor attack, and it's this pattern that I examine here. Using a simple webpage, attackers can cause a malformed animated cursor file to be downloaded to the viewer's PC, cause that animated cursor to be invoked by the browser, and upon invocation cause arbitrary code injection to take place. In essence, it is a perfect attack vector, given that it requires zero physical access to the machine being attacked, zero end-user knowledge that anything untoward might be happening, and zero outward impact to end users if the payload of the attack is suitably malicious.

Consider Example 1(a), which is paraphrased from the Windows exploit, of course, that forms the basis for this type of attack vector. The developer here is making a basic assumption about the trustworthiness of the incoming stream. Trust the stream and everything is fine. Call that function with a stack-based type to be deserialized, and an unknown stream of data and code injection is bound to happen at some point.


<b>(a)</b>
void LoadTypeFromStream(unsigned char* stream, SOMETYPE* typtr)
{
  int len;
  // Get the size of our type's serialized form
  memcpy(&len, stream, sizeof(int));
  // De-serialize the type
  memcpy(typtr, stream + sizeof(int), len);
}

<b>(b)</b>

void foo(unsigned char* stream)
{
  SOMETYPE ty;
  LoadTypeFromStream(stream, &ty);
}

<b>(c)</b>
void LoadTypeFromStream
      (unsigned char* stream, SOMETYPE* typtr)
{
    int len;
    // Get the size of our type's serialized form
    memcpy(&len, stream, sizeof(int));
    // GUARD
    if( len < 0 || len > sizeof(SOMETYPE) )
        throw TaintedDataException();
    // De-serialize the type
    memcpy(typtr, stream + sizeof(int), len);
}

Example 1: Injections attacks.

So how does it happen? Assume you call the function in Example 1(b). Now we have an attack vector that is wide open to exploit. The problem here is that SOMETYPE has a defined size at compile time. Assume that it is represented in memory using 128 bytes. Now assume you construct the incoming stream so that the lead 4 bytes (the length of what will get deserialized) reads 256. Now, without checking the validity of what you're doing, you copy 256 bytes into a stack area reserved at only 128 bytes.

Given the typical layout of a release-mode stack, you're in trouble. Take a look at the stack to see why this is. Each function that is called lays out its local data in a frame on the stack, typically by subtracting the known size of that local data from the stack pointer on entry (plus any management data required to deal with the call chain itself). An idealized (pseudocode) function prolog emitted by the compiler reads something like:


 .foo
 sub sp, 128  ; sizeof SOMETYPE


The call to our exploitable function then reads something like:


push sp   ; push the SOMETYPE 
  local variable
push ap   ; push the stream 
  pointer (comes from 1st argument)
call LoadTypeFromStream
ret


On calling foo(), the caller pushes the stream address onto the stack, along with the return address (pushed as an implicit side effect of using the call directive, or whatever platform equivalent is made available), so that the stack contents have the 128 bytes that are reserved for our type abutted directly against the return address back to the caller of foo(); see Figure 1.

[Click image to view at full size]

Figure 1: Calling functions.

Now LoadTypeFromStream executes and writes 256 bytes into the address provided; that is, the value of the stack pointer (sp) before we called the function. This effectively overwrites the 128 bytes that are supposed to be used (at address 0x1000 in our example), plus the ensuing 128 bytes, including the incoming argument pointer, the return address, and whatever other information is stored in the next 128 bytes of the stack.

So how do attackers exploit this vulnerability? Well, it's not simple, and requires a tremendous amount of trial and error. In essence, attackers arrange the payload of the attack so that the overwritten return address transfers control to the attacker's payload rather than to the expected calling function. The attacker therefore needs to know exactly what data structure is being exploited, how big it is on whatever version of the operating system or application that is being attacked, what surrounds it (so that the bogus return address can be placed correctly), and how to meaningfully insert enough information so that the return address plus the rest of the payload can do something harmful.

Not easy things to do, but as many different attacks have shown, some people have way too much time on their hands!

How should you defend against this type of attack? Is it one attack, or several? Does the code being written really have to be as dumb as that shown here? And don't modern compilers do weird things to stack frame layout to get around this problem?

In summary, obfuscation is no defense. We all realize that the easier the programmer makes the attack, the surer that the attack will come. Yet, even complex code that isn't suitably defensive can (and will) be attacked sooner or later. This attack vector, which leverages both tainted data flow and a very basic buffer overflow vulnerability, has been the subject of continuous and heated research for years now, but still yields a significant number of exploits every year.

Defense against this flaw is as trivial as the attack is complex—guard your data assumptions. The addition of one simple line of code to Example 1(a) makes it solid; see Example 1(c). Obviously, as the stream interaction becomes more complex so does the guarding requirement, but at its most basic, code injection qualifies as an "unforgivable" sin in coding, as the known defenses are so prevalent and so simple.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.