Tail recursion has come up in a few conversations this week. This post explores a generalization of tail call optimization that I wasn't aware of until Doug described it to me.
If you've ever written a few programs in a functional programming language, you're probably familiar with tail recursion. Recursion is the idiomatic way of repeating an action in most functional languages, but it potentially allocates space on the stack for each recursive call. Modern compilers optimize certain kinds of recursive calls to eliminate this overhead. In particular, if the function directly returns the result of the recursive call, the current stack frame can be reused. In other words, instead of calling ourselves recursively and then returning the answer, we just jump to the top of the function, eliminating the call and return sequence, as well as the additional stack frame.
This is all old news. What's new to me is that there are other kinds of recursive calls that can be optimized to eliminate call/return sequences and stack frames. One such example (also described on the wiki page linked above) is "tail recursion modulo
cons," which allows post-processing the result of the recursive call with a call to
cons. More precisely, we are allowed to
cons a known value (that is, a value that is independent of the recursive call) onto the result of the recursive call. For example, the following implementation of
map is not tail recursive, but it is tail recursive modulo
(define (my-map f l) (if (null? l) '() (cons (f (car l)) (my-map f (cdr l)))))
It's not tail recursive because the recursive call to
my-map appears inside a call to another function. But it is tail recursive modulo
cons because the only thing standing between the recursive call and the final result is a call to
cons with a known value, namely
(f (car l)), which is computed before the recursive call is made.
It turns out that functions of this form can be optimized to use constant stack space, just like tail recursive functions. When Doug told me about this, he couldn't remember the details, but we had fun figuring it out for ourselves. Perhaps you'd like to try to do the same before reading on.
Welcome back. One answer is to transform the code using the typical tail recursive trick, introducing an accumulator argument that we
cons onto in each recursive call and then reverse in the base case. This is perfectly fine, but there's an even better answer: we can build up the list from front to back instead of back to front.
The basic idea is that you do "most" of the
cons before making the recursive call. We don't yet know what the result of the recursive call, but we can still allocate the pair and fill out the
car. We then expect the recursive call to fill out the
cdr with its result.
In code, we first define a helper function that implements this recursive behavior, side-effecting the
cdr of the accumulator with the new
(define (help f l acc) (if (null? l) (void) (let ([p (mcons (f (car l)) '())]) (set-mcdr! acc p) (help f (cdr l) p))))
All that remains is a little initial bookkeeping to set up the recursive behavior. We allocate the initial
cons cell and then call the helper function.
(define (my-map-append f l) (if (null? l) '() (let ([result (mcons (f (car l)) '())]) (help f (cdr l) result) result)))
This code is ugly, especially because it repeats itself between the setup and recursive code. I lack the scheme-fu to clean it up, but I'd be interested in seeing any more elegant implementations you come up with.
Very preliminary and probably completely unreliable performance measurements indicate that this implementation closes the gap (around 15%) between the user-level reverse-based tail recursive implementation and the built-in implementation in Racket.
What is it about
cons that makes this optimization possible? What other functions can post-process the result of a recursive call while still enable tail-call-like optimization? Even more generally, what expression contexts can a recursive call appear in?
The property of
cons that seems key here is that it can "process" its first argument without access to the final value of the second argument, which we can slot into place later. This seems vaguely related to the notion of strictness from the Haskell community: if a function is strict in an argument, then recursive calls passed in that argument are amenable to the optimization. Maybe you know how to make this correspondence precise?
Another function that can be optimized is computing the sum of a list. We introduce an accumulator argument and add each element to that as we recurse. The fact that addition is associative makes this optimization possible. This seems unrelated to the strictness criterion proposed above. Is there a more general property that encompasses both examples?
I'd be interested to hear your thoughts on these questions. Here's a random paper from the '70s that might be related (I haven't finished it yet).