Channels ▼
RSS

Tackling C++ Tail Calls


February 04:

A tail call is a function call invoked from the tail position of the caller. For instance, in Examples 1(a) and 1(b), foo returns after issuing a function call to bar with at least one argument being pushed on a newly opened stack frame for bar. Here, the question arises as to whether it is sensible to reserve the new stack frame for the callee because, at this point, foo has already finished its computation (except for having to return to its own caller). The contents of foo's stack frame have, indeed, become redundant at this point if no address of a local was previously passed.

Wouldn't it be more efficient to abandon foo's stack frame totally, reuse the memory, and let bar return to foo's caller instead? The answer is yes, especially if bar creates a mutually recursive dependency with foo, possibly resulting in a massively growing runtime stack while executing that code. The information built up on the stack during that process is hardly of any use at all; it is merely necessary to trace the original entry point of the recursion.

The general idea of optimization is obvious — tail calls have to be mapped to simple jump instructions at the assembly level. More precisely, the compiler has to detect suitable tail calls and turn the built-in call command into a restore-stack-and-jump sequence. By doing so, the callee can "recycle" the current function's stack frame, instead of using a new one. This not only prevents the runtime stack from overflowing, but also saves valuable time at the end of, say, a deeply recursive function when all frames would normally be popped off the stack — one after another — by following the logical links to the initial caller via the individual return addresses inside each frame.

Applications

This kind of compiler optimization is great for a variety of applications. For instance, in his article "State Machine Design in C++" (C/C++ Users Journal, May 2000), David Lafreniere proposed the implementation of finite state machines where each state is a C++ function. An additional method, called "state engine," repeatedly has to call the state functions indirectly, according to the current state, event, and transition map entry:

<b>while (...)
{
  ...    
(this->*pStateMap[currentState].pStateFunc) (pDataTemp);
}</b>

However, tail call optimization lets you omit the concept of state engines. Instead, each state function could directly tail call the function of the succeeding state without causing an unpredictable growth of the runtime stack. Besides reduced memory consumption and increased speed, this provides for enhanced encapsulation because there is no more need for a global transition table. Instead, each state can be regarded as a separate object comprising state and operation.

Another example is the translation of functional and logic languages (such as Haskell or Mercury) to high-level C. Functional programs make intense use of recursive calls and, not surprisingly, a high percentage of these are tail calls. Without optimization, it is not possible to map them directly to C function calls because the high frequency of tail calls would badly degrade performance and eventually break the stack. Consequently, common implementations of functional languages abstain from direct calls between functions.

Further crucial applications of tail call optimization can typically be encountered in the efficient implementation of Just-In-Time compilers and virtual machines.

Why Tail Call Optimization Is Difficult

In theory, tail call optimization is intuitive. However, it is difficult to implement in already existing compiler suites for a number of different reasons. To illustrate this point, I assume the behavior and layout of a Linux/UNIX ix86 runtime stack; see Figure 1.

A first major restriction is directly related to the C calling convention that, unlike in Pascal, assigns the responsibility for cleaning up the locals of a stack frame to the callee. The argument space, however, is normally freed by the caller. In other words, when foo issues a tail call, it would have to break the convention by deleting, or replacing, its own incoming argument space; see Figure 2.

As Figure 3 illustrates, the issue gets even more complicated when handling variable argument functions that are possible in C, but not in Pascal. If b->b' was a tail call, then the next tail call from b'->b" would be impossible to realize, because b' has absolutely no information about how many arguments it has been invoked with. (In C, only the caller knows about the number of arguments!) However, this is important information because straightforward tail call optimization requires the callee's arguments to consume as much stack space as the caller's; otherwise, the function's return address, base pointer, and potentially saved registers would have to be shifted downwards to free additional slots for bigger, or additional, arguments; see Figure 4. Of course, this also accounts for return values, because the optimization must be absolutely transparent to the top-most caller.

Another limiting aspect is a pathological feature of C/C++ itself. It is possible to assign a local's address to a global variable and use it deeper down the call stack as it happens here:

<b>  int* global;
  bar ()
  {
    ...
    *global = 42;
  }
  foo ()
  {
    ...
    global = &local;
    ...
    bar ();
  }</b>

In fact, the caller's stack frame must not be deleted in such cases, even if the function finishes by issuing a tail call. Live references like that could only be detected by performing a sophisticated liveness analysis on the code which has, until now, not been a necessity in popular C/C++ compiler suites.

Equally challenging is the handling of indirect function calls as they appear, for example, in continuation passing (CP). In CP, the program flow (that is, the computation) is based upon an implicit parameter, the continuation, which points to the next function being executed. These calls via pointer arguments usually demand register indirect addressing and, thus, require the machine to have one extra register available to hold the target function's address. However, when issuing a tail call it is necessary to have all callee-saved registers restored to the state the caller expects them in, and all argument registers loaded as the called function requires. Sometimes even scratch registers are involved to accomplish all this.

Unfortunately, the ABIs (Application Binary Interface) of many platforms (ARM-based ones, for instance) define all call-clobbered registers to hold function arguments during a call. Consequently, not much space is left for the target address. As a result, complex stack-shifting operations would be necessary to optimize those targets, or they simply have to be disregarded.

There are other reasons that make tail call optimization in C/C++ compilers difficult to implement, including position-independent code and (in GNU C) volatile functions that are hard to identify as being actual tail calls [1].

Tail Call Optimization in GCC

To address optimization needs, GCC has introduced the concept called "sibcalls" (short for "sibling calls"). Basically, a sibcall is an optimized tail call, but restricted to functions that share a similar signature. In other words, the compiler considers two functions as being siblings if they share the same structural equivalence of return types, as well as matching space requirements of their arguments.

For example, again assuming the ABI of ix86 Linux/UNIX, a tail call from function foo to bar would be a potential optimization candidate, because both share the same return type. Two arguments of type short are represented internally by using four bytes altogether, which is the same size as one long long argument:

<b>int foo (long long a);
int bar (short a, short b);</b>

This restriction is necessary because in a chain of sibcalls, the top-most caller who is calling a tail-recursive function (and being unaware of it) attempts to clean up the callee's arguments when computation has finished. However, if this callee is allowed to exceed its own incoming argument space to perform a sibcall to a function requiring more argument stack space, you would end up with a memory leak when the top-most caller attempts to free the stack slots.

Another reason, related to this, is the shifting of the return address; see Figure 3. Apart from being a technical challenge, it would also break binary compatibility with other programs and libraries that do not support this notion of stack handling. Unaware third-party procedures would not be prepared to perform stack-shifting operations or, alternatively, to let the callee worry about the necessary memory clean ups.

Therefore, successful sibcall optimization is restricted to perform the following operations in the given order:

  • Restore callee-saved registers.
  • Overwrite argument space with new arguments.

  • Replace return address for the tail-called function.

  • Jump to target address and begin computation.

This is already useful for simple tail-recursive functions that do not allocate stack space for locals (as in Listing 1, for example). However, more flexibility is necessary to support indirect calls. Additionally, the compiler needs to make sure that the target platform has an extra register to store the target function's address; otherwise, all optimization attempts have to be aborted.

The implementation of this kind of extension is described in detail in [1] and is available in GCC 3.4 or higher. It basically works like this: The compiler allows indirect sibcalls where possible and disregards all target platforms with a too-restrictive ABI per se. That is, it checks the platform-specific predicates HAVE_sibcall_epilogue and (*targetm.function_ok_for_sibcall) (fndecl, exp), where fndecl is the target function's declaration (empty for indirect calls) and exp represents the function expression node. function_ok_for_sibcall is hooked with the machine description and — with fndecl being empty — is defined only when the target platform can guarantee a spare register to hold an address for the indirect sibcall.

More Sophisticated Examples

The logic programming language compiler, Mercury, maintains its own internal data structure for handling parameter passing between functions. However, it is bound to use the target's runtime stack for implementing a function call sequence [2]. Since not all Mercury predicate calls can be mapped directly to C calls and returns, Henderson and Somogyi invented a "driver loop" that would allow some sort of continuation passing:

<b>typedef void* Func (void);
void driver (Func* entry)
{
  register Func* fp = entry;
  while (fp != NULL)
    fp = (Func*) (*fp)();
}</b>

Here, each C function returns the address of the next function that has to be executed to a pointer variable, fp. This indirection is necessary to prevent the stack from overflowing as the authors expected it to happen, for instance, with ordinary continuation passing.

However, with indirect sibcalls, there is no actual need to return before branching off to a subroutine. Instead, a C function could pass on the continuation without having to temporarily issue control back to a driver or, alternatively, risking a stack overflow:

<b>typedef void* Func ();
void any_func (Func* entry)
{
  ...  /* Reassign entry */
  (*entry) (continuation_ptr);
}</b>

Ertl has presented a similar example [3] in which he proposes the following code as one possible way to implement a "next-function" for a threaded Forth virtual machine interpreter:

</b><b>typedef void (* Inst)();
void inst1 (Inst *ip)
{
  ...
  (*ip) (ip + 1);
}</b>

He restrains from this solution, though, because of the lack of support for optimized indirect tail calls in a C compiler. Indirect sibcalls, as they are implemented in GCC 3.4, however, support this notion of a "next-function" perfectly.

Conclusion

With the support for indirect calls, tail call optimization as is available in GCC 3.4 has made an important step forward. However, there are still a number of obstacles to overcome before GCC fully offers optimization for general tail calls. Moreover, at this writing, the indirect sibcalls are only fully functional on ix86 platforms. Porting the deployed mechanism to targets such as SPARC or PowerPC should be relatively straightforward since the current implementation merely adds to the previously defined sibcall patterns (for direct calls). Of course, this approach is much more portable and maintainable than (say) introducing an entirely new calling convention to the back end, which has to work around all existing standards on all supported platforms likewise. However, the downside of this approach is that targets such as ARM-based platforms are bound to miss out because of restrictions imposed by their ABI definition. That said, tail call optimization has already become a very powerful and flexible feature of the GCC suite that lets you rely on it in various situations and applications.

References

[1] Bauer, A. "Compilation of Functional Programming Languages Using GCC — Tail Calls," Master's Thesis, Technische Universitt München, Germany, 2002.

[2] Henderson F. and Z. Somogyi. "Compiling Mercury to High-Level C Code," CC'02, Grenoble, France, 2002.

[3] Ertl, M.A. "Threaded Code," Position paper, Technische Universität Wien, Austria, http://www.complang.tuwien.ac.at/forth/threaded-code.html.


Andreas Bauer is a Ph.D. student and Markus Pizka is an Assistant Professor at the Technische Universität München, Germany. They can be contacted at baueran@in.tum.de and pizka@in.tum.de, respectively.



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video