Practical View
gcc's Stack Smashing Protector (SSP) was developed by IBM and originally called ProPolice. It is available as a patch for gcc 3.x and has been included by default in gcc since version 4.1. The command line flag -fstack-protector switches SSP on. -fno-stack-protector switches it off. If nothing is specified at the invocation of gcc, whether SSP is switched on or off by default depends on the system and the configuration of gcc. Some systems (like OpenBSD) switch SSP on. Others (like Debian GNU/Linux) switch it off.
For a practical view, I will use Debian GNU/Linux 6.0.1 with gcc 4.4.5 on a x86_64 system. The test program is vuln.c (Listing One). It suffers from a vulnerability in function vuln(). It uses strcpy() without any previous boundary checks. The buffer buf overflows if more than 20 bytes are provided via parameter str.
#include <string.h>
void vuln(const char *str)
{
char buf[20];
strcpy(buf, str);
}
int main(int argc, char *argv[])
{
vuln(argv[1]);
return 0;
}
Listing One
Depending on the grade of destruction, the program may print a backtrace and perhaps a memory dump. But in all cases, the program detects stack smashing and terminates itself.
Compiler-Added Code
Let's look at the generated assembly code with and without enabled SSP. The following commands generate assembly language code:
gcc -S -o vuln-without-ssp.s -fno-stack-protector vuln.c
gcc -S -o vuln-with-ssp.s -fstack-protector vuln.c
Listings Two and Three show commented versions of the generated assembly code for function vuln().
vuln:
.LFB0:
.cfi_startproc
pushq %rbp ; current base pointer onto stack
.cfi_def_cfa_offset 16
movq %rsp, %rbp ; stack pointer becomes new base pointer
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp ; reserve space for
; local variables on stack
; bring arguments from registers onto stack
movq %rdi, -40(%rbp) ; 1st argument from rdi to stack
; prepare parameters for strcpy()
movq -40(%rbp), %rdx ; 1st argument to rdx
leaq -32(%rbp), %rax ; 2nd argument to rax
; call strcpy()
movq %rdx, %rsi ; source address from rdx to rsi
movq %rax, %rdi ; destination address from rax to rdi
call strcpy ; call strcpy()
leave ; clean-up stack
ret ; return
.cfi_endproc
Listing Two
Listing Two shows no surprises. After mandatory preparations, like saving the current content of the base pointer register, the program prepares the stack for local variables. Linux’s typical juggling of arguments given in registers makes the code look a bit more complex. Before calling strcpy(), the source and destination addresses go into rsi and rdi where Linux' strcpy() expects them. When strcpy() terminates, the program cleans up the stack and jumps back to the return address. Listing Two is a typical assembly listing for code generated by a C compiler on Linux.
vuln:
.LFB0:
.cfi_startproc
pushq %rbp ; current base pointer onto stack
.cfi_def_cfa_offset 16
movq %rsp, %rbp ; stack pointer becomes new base pointer
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp ; reserve space for
; local variables on stack
; bring arguments from registers onto stack
movq %rdi, -40(%rbp) ; 1st argument from rdi to stack
; SSP's prolog: put canary onto stack
movq %fs:40, %rax ; canary from %fs:40 to rax
movq %rax, -8(%rbp) ; canary from rax onto stack
xorl %eax, %eax ; set rax to zero
; prepare parameters for strcpy()
movq -40(%rbp), %rdx ; 1st argument to rdx
leaq -32(%rbp), %rax ; 2nd argument to rax
; call strcpy()
movq %rdx, %rsi ; source address from rdx to rsi
movq %rax, %rdi ; destination address from rax to rdi
call strcpy ; call strcpy()
; SSP's epilog: check canary
movq -8(%rbp), %rax ; canary from stack to rax
xorq %fs:40, %rax ; original canary XOR rax
je .L3 ; if no overflow -> XOR results in zero
; => jump to label .L3
; if overflow -> XOR results in non-zero
call __stack_chk_fail ; => call __stack_chk_fail()
.L3:
leave ; clean-up stack
ret ; return
.cfi_endproc
Listing Three
Listing Three is similar to Listing Two. The implemented functions are the same, but Listing Three adds SSP to the code. SSP's prolog starts with the first movq statement, adds a canary prototype from address fs:40 into register rax, and from there onto stack. After these steps, register rax is set to zero (by XORing it with itself).
After the subroutine's body is the SSP's epilogue. It reads the canary from the stack and stores it in register rax. Then it calculates fs:40 . In other words: It XOR raxXORs the canary prototype with the canary from the stack. If this operation results in zero, everything is fine. fs:40 and the value in rax are then known to be identical. No stack overflow has occurred that destroyed the canary or the return address. If the result of the XOR operation returns a non-zero result, the canary on the stack has been changed and a buffer overflow has indeed occurred.
This result, non-zero or zero, sets or resets the zero flag in the CPU's status register. From there, it is simple to use a conditional jump. In the case of gcc's SSP, it's a je (jump if equal/zero), which does the trick. If the result is zero, the program jumps to label .L3 and subroutine __stack_chk_fail won't be called. The program performs the stack clean-up and jumps back to the return address.
On a non-zero result of the XOR operation, no jump is executed. __stack_chk_fail is called. This subroutine handles the detected overflow and terminates the process. This scheme of prolog/epilog and canary is very simple but effective.


