# Tail Call Optimization and Java

I was recently asked whether Java SE 8 contained an optimization for tail calls, which are a special kind of function call. This optimization is most relevant for recursive calls, which are particularly important for functional languages, as they often rely on recursive design and coding patterns. In this article, I examine exactly what a tail call is, how it can be optimized effectively, and where Java 8 stands in terms of implementing this optimization.

Before we dive in, let's define a tail call.

### What is A Tail Call?

A tail call is a fancy term that refers to a situation in which a method or function call is the last instruction inside of another method or function (for simplicity, I'll refer to all calls as function calls from now on). Andrew Koenig touched on the topic in his blog series on optimizations. As an example, take the function `foo()` as defined here:

```int foo(int a) {
a = a + 1;
return func(a);
}
```

The call to function `func()` is the last statement in function `foo()`, hence it's a tail call. If `foo()` executed any instructions other than return after the call to `func()`, then `func()` it would no longer be a tail call.

You might be wondering why all of these semantics are important. It's really not very important or interesting until you consider recursive programming, where the tail call is a recursive function. For example:

```int fact(int i, int acc) {
if ( i == 0 )
return acc;
else
return fact( i – 1, acc * i);
}

int factorial(int n) {
if ( n == 0 )
return 1;
else
return fact( n – 1, 1 );
}
```

The assembly language for `fact()` is shown in Figure 1, to the right of the C source. The red arrow points to the last two instructions for the function `fact()`, which are the call to `fact()` itself and a `MOV` for the `return` statement. This fits the definition of a tail call.

Figure 1: Function `fact` is a tail call, as seen in the assembly code.

Although I chose to illustrate this in C (because it was easier to show the associated assembly code), the same holds true for Java. In fact, you can copy the function's source code into a Java class verbatim and it will compile and work the same. So, what exactly can be optimized?

### Tail Call Optimization Described

With any tail call, not just a recursive one, the function call itself can be optimized away and turned into what is effectively a `goto`. This means that the work to setup the stack before the function call and restore it afterwards (the prolog and epilog, respectively) can all be removed. For example, take the code below:

```int func_a(int data) {
data = do_this(data);
return do_that(data);
}
```

The function `do_that()` is a tail call. The unoptimized assembly code might look something like this:

```...		! executing inside func_a()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_this	! call do_this() by jumping to its address
...		! executing inside do_this()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_that	! call do_that() by jumping to its address
...		! executing inside do_that()
pop data	! prepare to return value of 'data'
pop data	! prepare to return value of 'data'
pop data	! prepare to return value of 'data'
...
```

Notice that multiple `POP` instructions for both data and the EIP register (to return the value of data and restore the instruction pointer) occur in succession? One set of these epilogs and the associated prolog (where data and the EIP register are saved on the stack) can be eliminated and replaced by a simple JMP assembly instruction. This can be done because function `do_that()` will execute the epilog code for the calling function, `func_a()`.

This optimization is is safe because `func_a()` doesn't perform any meaningful instructions after` do_that()` returns. (Note: If something were done with the return value after the call to `do_that()` returns, then this optimization cannot be performed because `do_that()` would no longer be a tail call.)

Here's the result of tail call optimization:

```...         ! executing inside func_a() push EIP    ! push current instruction pointer on stack push data   ! push variable ‘data’ on the stack jmp do_this ! call do_this() by jumping to its address ...         ! executing inside do_this() push EIP    ! push current instruction pointer on stack push data   ! push variable ‘data’ on the stack jmp do_that ! call do_that() by jumping to its address ...         ! executing inside do_that() pop data    ! prepare to return value of ‘data’ pop EIP     ! return to do_this() pop data    ! prepare to return value of ‘data’ pop EIP     ! return to func_a() caller pop data    ! prepare to return value of ‘data’ pop EIP     ! return to func_a() caller...```

Shown slightly differently, this is all of the code before the optimization:

 `int func_a(int data) { ` ` data = do_this(data);` ```push EIP ! prolog push data ! prolog jmp do_this ! call ... pop data ! epilog pop EIP ! epilog``` `return do_that(data);` ```push EIP ! prolog push data ! prolog jmp do_that ! call ... pop data ! epilog pop EIP ! epilog``` `}` ```pop data ! epilog pop EIP ! epilog```

Here it is again after tail call optimization:

 `int func_a(int data) { ` ` data = do_this(data);` ```push EIP ! prolog push data ! prolog jmp do_this ! call ... pop data ! epilog pop EIP ! epilog``` ` return do_that(data);` ```jmp do_that ! goto ...``` `}` ```pop data ! epilog pop EIP ! epilog```

Comparing the shaded row in both, you can see how much less machine code there is to run in the optimized version. However, this alone does not represent the true value of the change. This optimization really shines when you combine it with recursion.

If you review the recursive factorial example shown earlier, you can see that when combined with this optimization, all the prolog and epilog assembly code that would be needed repeatedly can be removed. This effectively turns this recursive pseudocode here (taken from a Wikipedia example):

```call factorial(3)
call fact(3,1)
call fact(2,3)
call fact(1 6)
call fact(0,6)
return 6
return 6
return 6
return 6
return 6
```

into this iterative pseudocode:

```call factorial(3)
call fact(3,1)
update variables with (2,3)
update variables with (1,6)
update variables with (0,6)
return 6
return 6
```

In other words, it replaces the recursive code with a loop, which is what most C, C++, or Java programmers would use here. Not only does this optimization reduce the function call prolog and epilog assembly for numerous recursive calls, it greatly reduces pressure on the stack. Without this optimization, calculating the factorial of very large numbers (or doing other large recursive operations) would likely blow the stack — that is, use up all the RAM allocated to the local stack. Reducing the code to a loop removes that risk. Because functional language programmers use recursion often, most functional language interpreters perform tail call optimization under the covers.

### How Is Java Involved?

As I stated above, Java programmers tend to avoid recursion through the use of loops. For instance, here's an iterative implementation of factorial in Java:

```    int factorial(int n) {
int result = 1;
for (int t=n; t > 1; t--)
result *= t;
return result;
}```

Therefore, it isn't a big deal to most Java developers that Java doesn't perform tail call optimization. In fact, my guess is that many Java developers have never even heard of this optimization. But it does become an issue when you run any of the related functional languages on top of the JVM. All of sudden, recursive code that ran fine with that language's interpreter may begin blowing the stack when executed on the JVM.

It's important to note that this isn't a bug in the JVM. It's an optimization that can be implemented to help functional programmers who use recursion, which is much more common and normal in those languages. I recently spoke to Brian Goetz at Oracle about this optimization, and he said that it's on a list of things to be added to the JVM, but it's just not a high-priority item. For now, it's best to make this optimization yourself, if you can, by avoiding deeply recursive functions when coding a functional language on the JVM.

For more information on the assembly language details behind function calls, see this computer-science lecture. [PDF]

Eric Bruno is the Java blogger for Dr. Dobb's.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.