Channels ▼
RSS

JVM Languages

Tail Call Optimization and Java


I was recently asked whether Java SE 8 contained an optimization for tail calls, which are a special kind of function call. This optimization is most relevant for recursive calls, which are particularly important for functional languages, as they often rely on recursive design and coding patterns. In this article, I examine exactly what a tail call is, how it can be optimized effectively, and where Java 8 stands in terms of implementing this optimization.

Before we dive in, let's define a tail call.

What is A Tail Call?

A tail call is a fancy term that refers to a situation in which a method or function call is the last instruction inside of another method or function (for simplicity, I'll refer to all calls as function calls from now on). Andrew Koenig touched on the topic in his blog series on optimizations. As an example, take the function foo() as defined here:

int foo(int a) {
  a = a + 1;
  return func(a);
}

The call to function func() is the last statement in function foo(), hence it's a tail call. If foo() executed any instructions other than return after the call to func(), then func() it would no longer be a tail call.

You might be wondering why all of these semantics are important. It's really not very important or interesting until you consider recursive programming, where the tail call is a recursive function. For example:

int fact(int i, int acc) {
  if ( i == 0 ) 
    return acc;
  else
    return fact( i – 1, acc * i);
}

int factorial(int n) {
  if ( n == 0 )
    return 1;
  else
    return fact( n – 1, 1 );
}

The assembly language for fact() is shown in Figure 1, to the right of the C source. The red arrow points to the last two instructions for the function fact(), which are the call to fact() itself and a MOV for the return statement. This fits the definition of a tail call.

JavaTailCall
Figure 1: Function fact is a tail call, as seen in the assembly code.

Although I chose to illustrate this in C (because it was easier to show the associated assembly code), the same holds true for Java. In fact, you can copy the function's source code into a Java class verbatim and it will compile and work the same. So, what exactly can be optimized?

Tail Call Optimization Described

With any tail call, not just a recursive one, the function call itself can be optimized away and turned into what is effectively a goto. This means that the work to setup the stack before the function call and restore it afterwards (the prolog and epilog, respectively) can all be removed. For example, take the code below:

int func_a(int data) {
  data = do_this(data);
  return do_that(data);
}

The function do_that() is a tail call. The unoptimized assembly code might look something like this:

...		! executing inside func_a()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_this	! call do_this() by jumping to its address
...		! executing inside do_this()
push EIP	! push current instruction pointer on stack
push data	! push variable 'data' on the stack
jmp do_that	! call do_that() by jumping to its address
...		! executing inside do_that()
pop data	! prepare to return value of 'data'
pop EIP	! return to do_this()
pop data	! prepare to return value of 'data'
pop EIP	! return to func_a()
pop data	! prepare to return value of 'data'
pop EIP	! return to func_a() caller
...

Notice that multiple POP instructions for both data and the EIP register (to return the value of data and restore the instruction pointer) occur in succession? One set of these epilogs and the associated prolog (where data and the EIP register are saved on the stack) can be eliminated and replaced by a simple JMP assembly instruction. This can be done because function do_that() will execute the epilog code for the calling function, func_a().

This optimization is is safe because func_a() doesn't perform any meaningful instructions after do_that() returns. (Note: If something were done with the return value after the call to do_that() returns, then this optimization cannot be performed because do_that() would no longer be a tail call.)

Here's the result of tail call optimization:

...         ! executing inside func_a()
push EIP    ! push current instruction pointer on stack
push data   ! push variable ‘data’ on the stack
jmp do_this ! call do_this() by jumping to its address
...         ! executing inside do_this()
push EIP    ! push current instruction pointer on stack
push data   ! push variable ‘data’ on the stack

jmp do_that ! call do_that() by jumping to its address
...         ! executing inside do_that()
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to do_this()
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to func_a() caller
pop data    ! prepare to return value of ‘data’
pop EIP     ! return to func_a() caller...

Shown slightly differently, this is all of the code before the optimization:

int func_a(int data) {

data = do_this(data);

push EIP ! prolog
push data ! prolog
jmp do_this ! call
...
pop data ! epilog
pop EIP ! epilog

return do_that(data);

push EIP ! prolog
push data ! prolog
jmp do_that ! call
...
pop data ! epilog
pop EIP ! epilog

}

pop data ! epilog
pop EIP ! epilog

Here it is again after tail call optimization:

int func_a(int data) {

data = do_this(data);

push EIP ! prolog
push data ! prolog
jmp do_this ! call
...
pop data ! epilog
pop EIP ! epilog

return do_that(data);

jmp do_that ! goto
...

}

pop data ! epilog
pop EIP ! epilog

Comparing the shaded row in both, you can see how much less machine code there is to run in the optimized version. However, this alone does not represent the true value of the change. This optimization really shines when you combine it with recursion.

If you review the recursive factorial example shown earlier, you can see that when combined with this optimization, all the prolog and epilog assembly code that would be needed repeatedly can be removed. This effectively turns this recursive pseudocode here (taken from a Wikipedia example):

call factorial(3)
  call fact(3,1)
    call fact(2,3)
      call fact(1 6)
        call fact(0,6)
        return 6
      return 6
    return 6
  return 6
return 6

into this iterative pseudocode:

call factorial(3)
  call fact(3,1)
  update variables with (2,3)
  update variables with (1,6)
  update variables with (0,6)
  return 6
return 6

In other words, it replaces the recursive code with a loop, which is what most C, C++, or Java programmers would use here. Not only does this optimization reduce the function call prolog and epilog assembly for numerous recursive calls, it greatly reduces pressure on the stack. Without this optimization, calculating the factorial of very large numbers (or doing other large recursive operations) would likely blow the stack — that is, use up all the RAM allocated to the local stack. Reducing the code to a loop removes that risk. Because functional language programmers use recursion often, most functional language interpreters perform tail call optimization under the covers.

How Is Java Involved?

As I stated above, Java programmers tend to avoid recursion through the use of loops. For instance, here's an iterative implementation of factorial in Java:

    int factorial(int n) {
       int result = 1;
       for (int t=n; t > 1; t--)
           result *= t;
       return result;
     }

Therefore, it isn't a big deal to most Java developers that Java doesn't perform tail call optimization. In fact, my guess is that many Java developers have never even heard of this optimization. But it does become an issue when you run any of the related functional languages on top of the JVM. All of sudden, recursive code that ran fine with that language's interpreter may begin blowing the stack when executed on the JVM.

It's important to note that this isn't a bug in the JVM. It's an optimization that can be implemented to help functional programmers who use recursion, which is much more common and normal in those languages. I recently spoke to Brian Goetz at Oracle about this optimization, and he said that it's on a list of things to be added to the JVM, but it's just not a high-priority item. For now, it's best to make this optimization yourself, if you can, by avoid deeply recursive functions when coding a functional language on the JVM.

Further Reading

For more information on the assembly language details behind function calls, see this computer-science lecture. [PDF]

Wikipedia has additional information on tail calls and related optimization.


Eric Bruno is the Java blogger for Dr. Dobb's.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-98660dccf733221fd5cb507420d627ea
2014-05-06T15:12:08

This all looks fine now. Feel free to delete this thread.


Permalink
ubm_techweb_disqus_sso_-b7d7fcc88ce4bed88ba02c94095f21ef
2014-04-22T03:46:32

I'm surprised Brian didn't let you know that OO programming languages need tail call optimisation too. http://www.eighty-twenty.org/i...


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2014-04-18T18:52:31

I've been asked which tool I used for the side-by-side C / Assembly code pictured in the article. It's actually NetBeans 8.0 with the C++ plugin installed, where I chose the "Disassembly" window. I did this on a Mac running Mavericks, however, which required I build and install gdb (the GNU debugger), for which I used Homebrew.


Permalink
AndrewBinstock
2014-04-17T19:02:22

Note that Groovy, while not a functional language, does do tail-call optimization in its most recent release.


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2014-04-17T19:00:27

I haven't worked with it myself, but a little research into the issue reveals that Scala has the Trampolines library specifically to help in this area. Scala 2.8.0 introduced the object scala.util.control.TailCalls specifically to help with the lack of JVM tail call optimization.


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2014-04-17T18:46:58

I can understand the disappointment about the lack of this optimization, but saying that the JVM isn't good for functional languages is an overstatement in my opinion. Scala itself is built in Java, relies on the JVM, and from what I hear, doesn't suffer at all from this decision. There are many benefits from running on the JVM, overall performance is one of the largest. Portability is another. I'd love to hear how other programmers have worked around the lack of tail call optimization in the JVM.


Permalink
ubm_techweb_disqus_sso_-214d2be5cd7efd5605d4e24cc2508747
2014-04-17T16:46:48

JVM is not good for functional languages; languages like scala tend to suffer because of this. Some functional idioms such as mutually recursive functions cannot be easily optimized in this way


Permalink
dblake950
2014-04-16T14:59:20

The code presentation for the optimization code has been fixed (so you should now be able to see what was removed as strikethrough text). And the typo in the tables has also been addressed. Sorry for the snafus, folks!


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2014-04-16T14:41:31

Yes, that's truly a typo on my part. Thanks again.


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2014-04-16T14:40:45

Thanks Walter. I originally used strike throughs to demonstrate what was removed as part of the optimization, but this didn't translate to the online source display, so they look the same. We'll fix this. Thanks for brining this to my attention.


Permalink
waltersouffle
2014-04-16T11:13:56

In the "shown slightly differently section" should the "return do_that()" disassembly contain "jump do_that" rather than "jump do_this" ?


Permalink
waltersouffle
2014-04-16T11:10:28

I think I'm missing something but the disassembly after "Here's the result of tail call optimization:" looks identical to the unoptimized version.


Permalink

Video