Channels ▼
RSS

JVM Languages

Tail Call Optimization and Java


I was recently asked whether Java SE 8 contained an optimization for tail calls, which are a special kind of function call. This optimization is most relevant for recursive calls, which are particularly important for functional languages, as they often rely on recursive design and coding patterns. In this article, I examine exactly what a tail call is, how it can be optimized effectively, and where Java 8 stands in terms of implementing this optimization.

Before we dive in, let's define a tail call.

What is A Tail Call?

A tail call is a fancy term that refers to a situation in which a method or function call is the last instruction inside of another method or function (for simplicity, I'll refer to all calls as function calls from now on). Andrew Koenig touched on the topic in his blog series on optimizations. As an example, take the function foo() as defined here:

int foo(int a) {
  a = a + 1;
  return func(a);
}

The call to function func() is the last statement in function foo(), hence it's a tail call. If foo() executed any instructions other than return after the call to func(), then func() it would no longer be a tail call.

You might be wondering why all of these semantics are important. It's really not very important or interesting until you consider recursive programming, where the tail call is a recursive function. For example:

int fact(int i, int acc) {
  if ( i == 0 ) 
    return acc;
  else
    return fact( i – 1, acc * i);
}

int factorial(int n) {
  if ( n == 0 )
    return 1;
  else
    return fact( n – 1, 1 );
}

The assembly language for fact() is shown in Figure 1, to the right of the C source. The red arrow points to the last two instructions for the function fact(), which are the call to fact() itself and a MOV for the return statement. This fits the definition of a tail call.

JavaTailCall
Figure 1: Function fact is a tail call, as seen in the assembly code.

Although I chose to illustrate this in C (because it was easier to show the associated assembly code), the same holds true for Java. In fact, you can copy the function's source code into a Java class verbatim and it will compile and work the same. So, what exactly can be optimized?

Tail Call Optimization Described

With any tail call, not just a recursive one, the function call itself can be optimized away and turned into what is effectively a goto. This means that the work to setup the stack before the function call and restore it afterwards (the prolog and epilog, respectively) can all be removed. For example, take the code below:

int func_a(int data) {
  data = do_this(data);
  return do_that(data);
}

The function do_that() is a tail call. The unoptimized assembly code might look something like this:

...		! executing inside func_a()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_this	! call do_this() by jumping to its address
...		! executing inside do_this()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_that	! call do_that() by jumping to its address
...		! executing inside do_that()
pop data	! prepare to return value of 'data'
pop EIP	! return to do_this()
pop data	! prepare to return value of 'data'
pop EIP	! return to func_a()
pop data	! prepare to return value of 'data'
pop EIP	! return to func_a() caller
...

Notice that multiple POP instructions for both data and the EIP register (to return the value of data and restore the instruction pointer) occur in succession? One set of these epilogs and the associated prolog (where data and the EIP register are saved on the stack) can be eliminated and replaced by a simple JMP assembly instruction. This can be done because function do_that() will execute the epilog code for the calling function, func_a().

This optimization is is safe because func_a() doesn't perform any meaningful instructions after do_that() returns. (Note: If something were done with the return value after the call to do_that() returns, then this optimization cannot be performed because do_that() would no longer be a tail call.)

Here's the result of tail call optimization:

...         ! executing inside func_a()
push EIP    ! push current instruction pointer on stack
push data   ! push variable ‘data’ on the stack
jmp do_this ! call do_this() by jumping to its address
...         ! executing inside do_this()
push EIP    ! push current instruction pointer on stack
push data   ! push variable ‘data’ on the stack

jmp do_that ! call do_that() by jumping to its address
...         ! executing inside do_that()
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to do_this()
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to func_a() caller
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to func_a() caller...

Shown slightly differently, this is all of the code before the optimization:

int func_a(int data) {

data = do_this(data);

push EIP ! prolog
push data ! prolog
jmp do_this ! call
...
pop data ! epilog
pop EIP ! epilog

return do_that(data);

push EIP ! prolog
push data ! prolog
jmp do_that ! call
...
pop data ! epilog
pop EIP ! epilog

}

pop data ! epilog
pop EIP ! epilog

Here it is again after tail call optimization:

int func_a(int data) {

data = do_this(data);

push EIP ! prolog
push data ! prolog
jmp do_this ! call
...
pop data ! epilog
pop EIP ! epilog

return do_that(data);

jmp do_that ! goto
...

}

pop data ! epilog
pop EIP ! epilog

Comparing the shaded row in both, you can see how much less machine code there is to run in the optimized version. However, this alone does not represent the true value of the change. This optimization really shines when you combine it with recursion.

If you review the recursive factorial example shown earlier, you can see that when combined with this optimization, all the prolog and epilog assembly code that would be needed repeatedly can be removed. This effectively turns this recursive pseudocode here (taken from a Wikipedia example):

call factorial(3)
  call fact(3,1)
    call fact(2,3)
      call fact(1 6)
        call fact(0,6)
        return 6
      return 6
    return 6
  return 6
return 6

into this iterative pseudocode:

call factorial(3)
  call fact(3,1)
  update variables with (2,3)
  update variables with (1,6)
  update variables with (0,6)
  return 6
return 6

In other words, it replaces the recursive code with a loop, which is what most C, C++, or Java programmers would use here. Not only does this optimization reduce the function call prolog and epilog assembly for numerous recursive calls, it greatly reduces pressure on the stack. Without this optimization, calculating the factorial of very large numbers (or doing other large recursive operations) would likely blow the stack — that is, use up all the RAM allocated to the local stack. Reducing the code to a loop removes that risk. Because functional language programmers use recursion often, most functional language interpreters perform tail call optimization under the covers.

How Is Java Involved?

As I stated above, Java programmers tend to avoid recursion through the use of loops. For instance, here's an iterative implementation of factorial in Java:

    int factorial(int n) {
       int result = 1;
       for (int t=n; t > 1; t--)
           result *= t;
       return result;
     }

Therefore, it isn't a big deal to most Java developers that Java doesn't perform tail call optimization. In fact, my guess is that many Java developers have never even heard of this optimization. But it does become an issue when you run any of the related functional languages on top of the JVM. All of sudden, recursive code that ran fine with that language's interpreter may begin blowing the stack when executed on the JVM.

It's important to note that this isn't a bug in the JVM. It's an optimization that can be implemented to help functional programmers who use recursion, which is much more common and normal in those languages. I recently spoke to Brian Goetz at Oracle about this optimization, and he said that it's on a list of things to be added to the JVM, but it's just not a high-priority item. For now, it's best to make this optimization yourself, if you can, by avoid deeply recursive functions when coding a functional language on the JVM.

Further Reading

For more information on the assembly language details behind function calls, see this computer-science lecture. [PDF]

Wikipedia has additional information on tail calls and related optimization.


Eric Bruno is the Java blogger for Dr. Dobb's.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video