The Y combinator is a somewhat magical aspect of the untyped lambda calculus. Many people tried to explain this magic, but I found them somewhat unsatisfactory.

  • Most explanations show the combinator in the beginning and then simply demonstrate that the given combinator is correct.
  • Some other explanations contain lines like “Why don’t we try passing it to itself? And look, this turns out to be exactly what we want!”

The problem with these is that they don’t give an intuition for how one could constructively derive the combinator. What on earth makes you think that passing it to itself is a good idea? As such, this post, intended for those who know the Y combinator already but do not satisfy with the existing explanations, attempts to explain how one could constructively derive the Y combinator.

Introduction #

In lambda calculus, there’s no primitive support to define recursive functions. Can we define recursive functions in lambda calculus regardless? And how?

While our question focuses on lambda calculus, to make it easier to follow, we will use an actual programming language. Here, I choose to use Racket
The choice of Racket is that:
  • It is not statically typed, so we can write the combinator which would have been prohibited had we use a language with a standard type system like Hindley–Milner.
  • Its scoping rule is simple and close to lambda calculus. Unlike, say, JavaScript and Python, there’s no variable hoisting, so we won’t encounter a problem where a simple, innocent code const f = () => f(); unintentionally allows recursion.
, but other programming languages might work as well.

As mentioned above, this post is intended for those who know basic programming language theory. However, I will try to provide links as many as possible to accommodate non-target audience.

Enabling recursion #

In Racket, we can write code to compute $3! + 4!$ easily
This post will mostly use #lang racket, so I will elide the hash lang line from now on unless I want to highlight it.
.

1
2
3
4
5
6
7
#lang racket

(letrec ([fact (λ (n)
                 (case n
                   [(0) 1]
                   [else (* n (fact (- n 1)))]))])
  (+ (fact 3) (fact 4)))
30

Defining this recursive fact function is possible because letrec
Read (letrec ([‹x› ‹v›]) ‹e›) as “let ‹x› be a recursive function ‹v› and then return ‹e›
allows us to refer to fact inside the lambda function that will be fact itself.

Of course, lambda calculus has no letrec. We might ask how Racket implements letrec, and the short answer is that it does so by using mutation
For instance:
1
2
3
4
5
6
7
(let ([fact "dummy-value"])
  (begin
    (set! fact (λ (n)
                 (case n
                   [(0) 1]
                   [else (* n (fact (- n 1)))])))
    (+ (fact 3) (fact 4))))
, which lambda calculus doesn’t have either. Therefore, to simulate lambda calculus, we will write the factorial function in Racket without using letrec and mutation.

Here’s our first straightforward attempt:

1
2
3
4
5
(let ([fact (λ (n)
              (case n
                [(0) 1]
                [else (* n (fact (- n 1)))]))])
  (+ (fact 3) (fact 4)))
fact: unbound identifier in: fact

We simply change letrec to let
Read (let ([‹x› ‹v›]) ‹e›) as “let ‹x› be ‹v› and then return ‹e›
. While let and letrec are very similar, let doesn’t have the ability that allows us to refer to fact inside that lambda function. In fact, (let ([‹x› ‹v›]) ‹e›) doesn’t add any expressive power to lambda calculus at all because its semantics is completely equivalent to ((λ (‹x›) ‹e›) ‹v›), a lambda term. Some programming languages simply treats let as a syntactic sugar, and desugar (let ([‹x› ‹v›]) ‹e›) to ((λ (‹x›) ‹e›) ‹v›).

As expected, our straightforward attempt doesn’t work because the highlighted fact is unbound. How can we fix it? Let’s take a look at another example:

1
2
3
(let ([f (λ (n) (+ n c))])
  (let ([c 1])
    (f 99)))
c: unbound identifier in: c

The above code has a similar problem. the highlighted c is unbound. One way to fix the problem is to explicitly pass c to the lambda function, so that the highlighted c becomes bound:

1
2
3
(let ([f (λ (c n) (+ n c))])
  (let ([c 1])
    (f c 99)))
100

Our solution to the unbound fact problem is going to be the same. While fact is unbound inside the lambda function, it is bound in the body of let, so we can pass fact explicitly to the lambda function!

1
2
3
4
5
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n (fact (- n 1)))]))])
  (+ (fact fact 3) (fact fact 4)))
fact: arity mismatch;
the expected number of arguments does not match the given number
 expected: 2
 given: 1
 arguments...:

There’s still a problem: fact is now a function of arity 2, so the highlighted function application would result in an arity mismatch error. What does fact need as the first argument? The answer is fact! So we can fix the code accordingly:

1
2
3
4
5
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n (fact fact (- n 1)))]))])
  (+ (fact fact 3) (fact fact 4)))
30

With this trick, we seem to be able to write any recursive function $f$ in a language without letrec by the method A as follows:

  1. Initially write the recursive function $f$ with letrec.
  2. Change letrec to the let.
  3. At every callsite of $f$, prepend the arguments with $f$’s name.
  4. Prepend the parameters of $f$ with $f$’s name.
Method A

But is it always correct? Consider the following code:

1
2
3
4
5
6
(letrec ([fact (λ (n)
                 (case n
                   [(0) 1]
                   [else (* n (fact (- n 1)))]))])
  (let ([fact2 fact])
    (+ (fact2 3) (fact2 4))))
30

is transformed to:

1
2
3
4
5
6
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n (fact fact (- n 1)))]))])
  (let ([fact2 fact])
    (+ (fact2 3) (fact2 4))))
fact: arity mismatch;
 the expected number of arguments does not match the given number
  expected: 2
  given: 1
  arguments...:

As we can see, the method A is not always correct, because when $f$ is used in a non-function position, no change is made. Fortunately, it is possible to force every occurrence of identifier $f$ to be in a function position! If $f$ is a function of $n$-arity, then $f$ is observationally equivalent to $\lam{x_1 x_2 \dots x_n}{f x_1 x_2 \dots x_n}$ (when none of $x_1, x_2, \dots, x_n$ is equal to $f$)
This equivalence is a restricted form of of $\eta$-equivalence. In lambda calculus, every term denotes a lambda function, so it is possible to use $\eta$-conversion anywhere. In Racket, we have to be a little bit more conservative since a term might denotes, say, a number instead.
. Hence, if we insert another step between the method A’s step 2 and step 3 in order to replace all variable occurrences that are bound to $f$ with $\lam{x_1 x_2 \dots x_n}{f x_1 x_2 \dots x_n}$, what we will get is the method B, which now works for any program.

  1. Initially write the recursive function $f$ with letrec.
  2. Change letrec to the let.
  3. Replace all variable occurrences that are bound to $f$ with $\lam{x_1 x_2 \dots x_n}{f x_1 x_2 \dots x_n}$.
  4. At every callsite of $f$, prepend the arguments with $f$’s name.
  5. Prepend the parameters of $f$ with $f$’s name.
Method B

Let’s try using it!

1
2
3
4
5
6
(letrec ([fact (λ (n)
                 (case n
                   [(0) 1]
                   [else (* n (fact (- n 1)))]))])
  (let ([fact2 fact])
    (+ (fact2 3) (fact2 4))))

is transformed to:

1
2
3
4
5
6
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n ((λ (v) (fact fact v)) (- n 1)))]))])
  (let ([fact2 (λ (v) (fact fact v))])
    (+ (fact2 3) (fact2 4))))
30

Cool, it works now!

Omega #

Let’s digress a little bit here. What will happen if we perform the syntax transformation to this function which loops forever?

1
2
(letrec ([diverge (λ () (diverge))])
  (diverge))

Because every bound occurrence of diverge is in a function position, we can use the method A. The above code is transformed to:

1
2
(let ([diverge (λ (diverge) (diverge diverge))])
  (diverge diverge))

To obtain a lambda term, we simply desugar let as described above:

1
2
((λ (diverge) (diverge diverge))
 (λ (diverge) (diverge diverge)))

Recall that this is known as the $\Omega$-combinator, a classic example of a lambda term whose evaluation doesn’t terminate.

Deriving a family of Y combinator #

While we now know how to write recursive functions in lambda calculus, the transformed function doesn’t look like the original function that we would write in the letrec version. It would be nice if there is a term ‹make-recursion› such that:

1
2
3
4
5
(‹make-recursion›
  (λ (n)
    (case n
      [(0) 1]
      [else (* n (fact (- n 1)))])))

produces the actual factorial function. However, this clearly won’t work, because the highlighted identifiers are going to be unbound. To fix this, we simply make it bound by wrapping a lambda function around the body.

1
2
3
4
5
6
(‹make-recursion›
  (λ (fact)
    (λ (n)
      (case n
        [(0) 1]
        [else (* n (fact (- n 1)))]))))

We will call the lambda function that is the argument of ‹make-recursion› a recursion maker. What we want then is to find ‹make-recursion› so that for any ‹recursion-maker›, (‹make-recursion› ‹recursion-maker›) produces the actual recursive function.

How should we approach this? We don’t know what ‹make-recursion› looks like, but for the factorial function, we do know what ‹fact-maker› looks like. The strategy, then, is to morph a piece of code that uses the method B to implement the factorial function:

1
2
3
4
5
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n ((λ (v) (fact fact v)) (- n 1)))]))])
  (+ ((λ (v) (fact fact v)) 3) ((λ (v) (fact fact v)) 4)))

to a piece of code that uses ‹fact-maker›:

1
2
3
4
5
6
7
(let ([fact-maker (λ (fact)
                    (λ (n)
                      (case n
                        [(0) 1]
                        [else (* n (fact (- n 1)))])))])
  (let ([fact (‹???› fact-maker)])
    (+ (fact 3) (fact 4))))

If we can do this, the hole ‹???› is going to be ‹make-recursion› that we are looking for.

As a first step, notice that fact and n are both the formal parameters in the transformed code, whereas in ‹fact-maker› they are separated. Straightforwardly, we can use currying to separate the two apart. We of course need to change how we call the function as well. That is, change:

1
2
3
4
5
(let ([fact (λ (fact n)
              (case n
                [(0) 1]
                [else (* n ((λ (v) (fact fact v)) (- n 1)))]))])
  (+ ((λ (v) (fact fact v)) 3) ((λ (v) (fact fact v)) 4)))

to

1
2
3
4
5
6
(let ([fact (λ (fact)
              (λ (n)
                (case n
                  [(0) 1]
                  [else (* n ((λ (v) ((fact fact) v)) (- n 1)))])))])
  (+ ((λ (v) ((fact fact) v)) 3) ((λ (v) ((fact fact) v)) 4)))

For the next step, notice that we can use fact naturally at callsites, but in our current code we have this (λ (v) ((fact fact) v)) instead. What we will do is to shadow fact with that value so that it appears that we can use fact normally.

1
2
3
4
5
6
7
8
(let ([fact (λ (fact)
              (let ([fact (λ (v) ((fact fact) v))])
                (λ (n)
                  (case n
                    [(0) 1]
                    [else (* n (fact (- n 1)))]))))])
  (let ([fact (λ (v) ((fact fact) v))])
    (+ (fact 3) (fact 4))))

The first highlighted code is now very similar to ‹fact-maker›, and the second highlighted code exactly matches with the client side code. What we want to do next is to somehow convert (let ([fact (λ (v) ((fact fact) v))]) ‹body›) into (λ (fact) ‹body›) where ‹body› is the first highlighted code, and also move (λ (v) ((fact fact) v)) somewhere else. That’s simply desugaring let!

1
2
3
4
5
6
7
8
9
(let ([fact (λ (fact)
              ((λ (fact)
                 (λ (n)
                   (case n
                     [(0) 1]
                     [else (* n (fact (- n 1)))])))
               (λ (v) ((fact fact) v))))])
  (let ([fact (λ (v) ((fact fact) v))])
    (+ (fact 3) (fact 4))))

We managed to make ‹fact-maker› appear! Next, we can abstract ‹fact-maker› out as a variable named fact-maker:

1
2
3
4
5
6
7
8
(let ([fact-maker (λ (fact)
                    (λ (n)
                      (case n
                        [(0) 1]
                        [else (* n (fact (- n 1)))])))])
  (let ([fact (λ (fact) (fact-maker (λ (v) ((fact fact) v))))])
    (let ([fact (λ (v) ((fact fact) v))])
      (+ (fact 3) (fact 4)))))

Eventually, we will want ‹make-recursion› as a lambda term. However, there’s no let in lambda calculus, so we will reduce number of lets by fusing the two nested “let fact be ...” together in a straightforward way:

1
2
3
4
5
6
7
8
9
(let ([fact-maker (λ (fact)
                    (λ (n)
                      (case n
                        [(0) 1]
                        [else (* n (fact (- n 1)))])))])
  (let ([fact (λ (v)
                (((λ (fact) (fact-maker (λ (v) ((fact fact) v))))
                  (λ (fact) (fact-maker (λ (v) ((fact fact) v))))) v))])
    (+ (fact 3) (fact 4))))

As a final step, ‹make-recursion› should consume fact-maker as an argument, so we can abstract fact-maker out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(let ([fact-maker (λ (fact)
                    (λ (n)
                      (case n
                        [(0) 1]
                        [else (* n (fact (- n 1)))])))])
  (let ([fact ((λ (fact-maker)
                 (λ (v)
                   (((λ (fact) (fact-maker (λ (v) ((fact fact) v))))
                     (λ (fact) (fact-maker (λ (v) ((fact fact) v))))) v)))
               fact-maker)])
    (+ (fact 3) (fact 4))))
30

And we are done! The highlighted code is ‹make-recursion› that we are looking for.

Notice that there’s nothing inherent about the factorial function here. Had we use any other recursive function as a starting point, we would end up with a term that is $\alpha$-equivalent to this ‹make-recursion› as well. We will highlight this fact by $\alpha$-convert variable names to remove all references to the factorial function:

1
2
3
4
(λ (f)
  (λ (k)
    (((λ (x) (f (λ (v) ((x x) v))))
      (λ (x) (f (λ (v) ((x x) v))))) k)))

Actually, we can simplify our ‹make-recursion› a little bit further. Recall from Enabling recursion that if $e$ evaluates to a function of one argument, then $e$ is observationally equivalent to $\lam{x}{e x}$ (provided that $x$ does not freely occur in $e$). The transformation back and forth between $e$ and $\lam{x}{e x}$ is known as $\eta$-conversion. Here are some terms that we could derive from the above ‹make-recursion›:

  • From ‹make-recursion›, by $\eta$-converting the following highlighted code:

    1
    2
    3
    4
    (λ (f)
      (λ (k)
        (((λ (x) (f (λ (v) ((x x) v))))
          (λ (x) (f (λ (v) ((x x) v))))) k)))
    

    we obtain:

    1
    2
    3
    (λ (f)
      ((λ (x) (f (λ (v) ((x x) v))))
       (λ (x) (f (λ (v) ((x x) v))))))
    

    which is formally known as the Z combinator ($Z$).

  • From $Z$, by $\eta$-converting the following highlighted code:

    1
    2
    3
    (λ (f)
      ((λ (x) (f (λ (v) ((x x) v))))
       (λ (x) (f (λ (v) ((x x) v))))))
    

    we obtain:

    1
    2
    3
    (λ (f)
      ((λ (x) (f (x x)))
       (λ (x) (f (x x)))))
    

    which is formally known as the Y combinator ($Y$).

The original ‹make-recursion›, $Z$, and $Y$ are fixed-point combinators. The word combinator means that there’s no free variable in these terms, and the word fixed-point means that the application of the fixed-point combinator to $f$ is a solution to the equation $x = f(x)$ for any function $f$. For example, this property says that $Z(f) = f(Z(f))$. How is this fixed-point identity important? And how are ‹make-recursion›, $Z$, and $Y$ different? These questions will be answered in the next sections.

Fixed point #

Why is the fixed point identity such as $Z(f) = f(Z(f))$ true, and how is it important? This section will attempt to answer these questions. We will use the Z combinator as a primary example.

First, we will take a look at recursion makers from another perspective. In Deriving a family of Y combinator, recursion makers arise naturally, but we only know that (Z ‹recursion-maker›) results in a recursive function. What exactly are their role and properties? Let’s take a look at the ‹fact-maker› as a concrete example:

1
2
3
4
5
(λ (fact)
  (λ (n)
    (case n
      [(0) 1]
      [else (* n (fact (- n 1)))])))

What can we pass into $\text{fact-maker}$? It obviously consumes the factorial function! But provided that we don’t have the real factorial function already, the best we can do is to provide any other value instead. Say, we pass in a non-sensible value $42$. This results in:

1
2
3
4
(λ (n)
  (case n
    [(0) 1]
    [else (* n (42 (- n 1)))]))

What we have here is the factorial function that only works on $n = 0$. For $n > 0$, the highlighted $42$ will be used as a function value which results in an error. We will call this function $\text{fact}_0$.

What else can we pass into $\text{fact-maker}$? We just obtained $\text{fact}_0 = \text{fact-maker}(42)$, so we can use it. This results in:

1
2
3
4
(λ (n)
  (case n
    [(0) 1]
    [else (* n (‹fact0› (- n 1)))]))

Again, what we have here is the factorial function that works on $n \le 1$. Let’s call it $\text{fact}_1$. Next, we can pass $\text{fact}_1 = \text{fact-maker}(\text{fact-maker}(42))$ into $\text{fact-maker}$ to get $\text{fact}_2$. We can clearly see the trend here. $\text{fact-maker}^k(42) = \text{fact}_{k-1}$, the factorial function that works on $n \le k - 1$.

In general, any recursion maker $f$ propels the computation one more level deeper.

To get the actual factorial function, which should work for any natural number $n$, we evidently need an infinitely high tower of $\text{fact-maker}$. That is, $\text{fact} = \text{fact}_\infty = \text{fact-maker}(\text{fact-maker}(\text{fact-maker}(\dots)))$.

Note that:

$$\begin{align*}
\text{fact} &= \text{fact-maker}(\text{fact-maker}(\dots))\\
\text{fact-maker}(\text{fact}) &= \text{fact-maker}(\text{fact-maker}(\text{fact-maker}(\dots))) & \text{applying fact-maker}\\
&= \text{fact}\\
\end{align*}$$

which agrees with our definition of $\text{fact-maker}$: applying $\text{fact}$ to $\text{fact-maker}$ produces the actual $\text{fact}$ function.

1
2
3
4
5
(λ (fact)
  (λ (n)
    (case n
      [(0) 1]
      [else (* n (fact (- n 1)))])))

Recall that the Z combinator makes the recursion possible: $\text{fact} = Z(\text{fact-maker})$. Substituting this in the above equation, we obtain that $\text{fact-maker}(Z(\text{fact-maker})) = Z(\text{fact-maker})$.

In general, for any recursion maker $f$, $Z(f) = f(Z(f))$. The importance of this identity is that $Z(f) = f(Z(f)) = f(f(Z(f))) = f(f(f(Z(f)))) = \dots$ which creates an infinite tower of the recursion maker $f$, making recursion possible.

Yet another proof

We can also show that $Z(f) = f(Z(f))$ directly using the definition of $Z$:

$$\begin{align*}
Z &= \lam{f}{(\lam{x}{f (\lam{v}{(x x) v})})(\lam{x}{f (\lam{v}{(x x) v})})} & \\
Z(f) &= (\lam{\color{red}{x}}{f (\lam{v}{(\color{red}{x x}) v})})(\lam{x}{f (\lam{v}{(x x) v})}) & \\
&= f (\lam{\color{blue}{v}}{((\lam{x}{f (\lam{v}{(x x) v})}) (\lam{x}{f (\lam{v}{(x x) v})})) \color{blue}{v}}) & \beta \text{-reduction}\\
&= f ((\lam{x}{f (\lam{v}{(x x) v})}) (\lam{x}{f (\lam{v}{(x x) v})})) & \eta \text{-conversion}\\
&= f (Z(f)) & \\
\end{align*}$$

Y vs. Z #

How are ‹make-recursion›, $Z$, and $Y$ different? We can empirically test them:

  • The original ‹make-recursion›:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    (let ([make-recursion
           (λ (f)
             (λ (k) (((λ (x) (f (λ (v) ((x x) v))))
                      (λ (x) (f (λ (v) ((x x) v))))) k)))])
      (let ([fact-maker (λ (fact)
                          (λ (n)
                            (case n
                              [(0) 1]
                              [else (* n (fact (- n 1)))])))])
        (let ([fact (make-recursion fact-maker)])
          (+ (fact 3) (fact 4)))))
    
    30
  • The Z combinator:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    (let ([Z (λ (f)
               ((λ (x) (f (λ (v) ((x x) v))))
                (λ (x) (f (λ (v) ((x x) v))))))])
      (let ([fact-maker (λ (fact)
                          (λ (n)
                            (case n
                              [(0) 1]
                              [else (* n (fact (- n 1)))])))])
        (let ([fact (Z fact-maker)])
          (+ (fact 3) (fact 4)))))
    
    30
  • The Y combinator:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    (let ([Y (λ (f)
               ((λ (x) (f (x x)))
                (λ (x) (f (x x)))))])
      (let ([fact-maker (λ (fact)
                          (λ (n)
                            (case n
                              [(0) 1]
                              [else (* n (fact (- n 1)))])))])
        (let ([fact (Y fact-maker)])
          (+ (fact 3) (fact 4)))))
    
    Non-terminating

Uh oh.

We derive the Y combinator from the Z combinator via $\eta$-conversion, which intuitively should preserve semantics. Yet something clearly went wrong when we try to use the Y combinator for computing factorials, since the computation doesn’t terminate (while the Z combinator works as expected). In fact, the problem is much worse: any attempt to use Y will always result in an infinite loop in Racket!

Recall the following:

if $e$ evaluates to a function of one argument, then $e$ is observationally equivalent to $\lam{x}{e x}$ (provided that $x$ does not freely occur in $e$).

and how we convert $Z$:

1
2
3
(λ (f)
  ((λ (x) (f (λ (v) ((x x) v))))
   (λ (x) (f (λ (v) ((x x) v))))))

to $Y$:

1
2
3
(λ (f)
  ((λ (x) (f (x x)))
   (λ (x) (f (x x)))))

In Racket, when we evaluate, say, (Y fact-maker), does (x x) evaluate to a function of one argument? We can manually step through the evaluation to see what’s going on:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1.  (Y fact-maker)
==> ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) fact-maker)

2.  ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) fact-maker)
==> ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) (λ (fact) ...))

3.  ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) (λ (fact) ...))
==> ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x))))

4.  ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x))))
==> ((λ (fact) ...)
     ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x)))))

5.  ((λ (fact) ...)
     ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x)))))
==> ((λ (fact) ...)
     ((λ (fact) ...)
      ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x))))))

6.  ((λ (fact) ...)
     ((λ (fact) ...)
      ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x))))))
==> ((λ (fact) ...)
     ((λ (fact) ...)
      ((λ (fact) ...)
       ((λ (x) ((λ (fact) ...) (x x))) (λ (x) ((λ (fact) ...) (x x)))))))

...

The answer is no! Hence, in Racket, our $\eta$-conversion to turn $Z$ to $Y$ is unsound!

While Racket evaluates the term (Y fact-maker) exactly as described above, we can imagine another way to evaluate the term:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
1.  (Y fact-maker)
==> ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) fact-maker)

2.  ((λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x))))) fact-maker)
==> ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x))))

3.  ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x))))
==> (fact-maker ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x)))))

4.  (fact-maker ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x)))))
==> ((λ (fact) ...) ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x)))))

5.  ((λ (fact) ...) ((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x)))))
==> (λ (n)
      (case
        [(0) 1]
        [1 (* n (((λ (x) (fact-maker (x x))) (λ (x) (fact-maker (x x))))
                 (- n 1)))]))

In this evaluation trace, (Y fact-maker) successfully evaluates to the factorial function! The difference of these two evaluation traces is due to evaluation strategies, which describes what reducible expression (redex) should be $\beta$-reduced.

Most programming languages including Racket employ the first evaluation strategy presented above, which always evaluates arguments of a function application before the application itself. This strategy is known as call-by-value
The call-by-value strategy in fact is a family of strategies. It still leaves some room for flexibility. For example, the strategy does not specify which one between ((λ (x) x) 1) and ((λ (x) x) 2) should be reduced first when evaluating (+ ((λ (x) x) 1) ((λ (x) x) 2)). Racket particularly always evaluates the leftmost-innermost redex (outside of lambda functions) first.
. As another example for how this strategy works, it would evaluate (double (double 1)) where double is defined as (λ (x) (+ x x)) in the following way:

1
2
3
4
(double (double 1)) ==> (double (+ 1 1))
(double (+ 1 1))    ==> (double 2)
(double 2)          ==> (+ 2 2)
(+ 2 2)             ==> 4

On the other hand, the second strategy presented above calls the function application without waiting for arguments to be evaluated first. This strategy is known as call-by-name. It would evaluate (double (double 1)) in the following way:

1
2
3
4
5
6
(double (double 1))       ==> (+ (double 1) (double 1))
(+ (double 1) (double 1)) ==> (+ (+ 1 1) (double 1))
(+ (+ 1 1) (double 1))    ==> (+ 2 (double 1))
(+ 2 (double 1))          ==> (+ 2 (+ 1 1))
(+ 2 (+ 1 1))             ==> (+ 2 2)
(+ 2 2)                   ==> 4

We have seen that different evaluation stategies might not evaluate a term to the same result. This is disconcerting. Fortunately, the Church-Rosser theorem guarantees that given a term $t$, if two strategies successfully evaluate $t$ to values, then the two values from both strategies are equal, as we can see from the (double (double 1)) example. The only case that two evaluation strategies would evaluate a term to different results is when one of them doesn’t terminate, which is exactly what happens with the Y combinator in the call-by-value strategy.

There are advantages and disadvantages for each evaluation strategies. For instance, the call-by-name strategy has a beautiful property that if a term could be evaluated to a value by some evaluation strategy, then it will be able to evaluate the term to the value as well. This is the reason why the strategy successfully evaluate (Y fact-maker). The disadvantages are (1) it’s not really efficient, taking 6 steps to evaluate (double (double 1)) while the call-by-value strategy can evaluate the term in only 4 steps, and (2) it doesn’t interact well with mutation, which is a feature that most programming languages have. This is the reason why most programming languages use call-by-value instead.

Due to the call-by-value strategy, we can’t run the Y combinator in the standard Racket language. One cool feature of Racket however is its ability to create and use a new programming language via the hash lang line. In fact, the creators of Racket even call Racket a Programmable Programming Language. It is very easy to create a call-by-name language with Racket. Even better, someone did that already and we can just use it!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#lang lazy

(let ([Y (λ (f)
           ((λ (x) (f (x x)))
            (λ (x) (f (x x)))))])
  (let ([fact-maker (λ (fact)
                      (λ (n)
                        (case n
                          [(0) 1]
                          [else (* n (fact (- n 1)))])))])
    (let ([fact (Y fact-maker)])
      (! (+ (fact 3) (fact 4)))))) ;; we need to ! to force value for printing
30

The final remark in this section is that, the Z combinator is also known as the call-by-value Y combinator since it’s the Y combinator for call-by-value languages.

Y and me #

I learned the Y combinator two years ago from the PL class taught by Shriram Krishnamurthi. One great explanation (which Shriram pointed us to) is Matthias Felleisen’s A Lecture of the Why of Y, which similarly attempts to derive the Y combinator. However, the explanation that corresponds to the “Enabling recursion” section goes in a much slower pace and implicitly contains some elements from the “Fixed point” section. The strength of this approach in my opinion is that it crucially uses the fixed-point identity $Y(f) = f(Y(f))$ to justify the “self-application trick” in the derivation right away, so it motivates really well why $Y(f) = f(Y(f))$ is important. The approach that I use, on the other hand, simply focuses on where things are bound and how can we direct values to appropriate places, which results in a simpler explanation in my opinion. I also make use of let a lot, which I think helps with readability. The weakness is that I also need to talk about desugaring.

I didn’t discover this approach myself. The core insight of this approach is from a student’s homework that I graded in the current iteration of the PL class. A part of the homework asks students to write random programs in a language that is very similar to Racket without letrec and mutation. I didn’t expect that students will be able to write a really meaningful program unless they know a fixed-point combinator already, ... or so I thought. It turns out that one student wrote a program to calculate 1 + 2 + ... + n for arbitrary n without knowing a fixed-point combinator! Here’s the program in Racket:

1
2
3
4
5
(let ([triangle (λ (triangle n)
                  (case n
                    [(0) 0]
                    [else (+ n (triangle triangle (- n 1)))]))])
  (triangle triangle 10))
55

Granted, this is simply a rewrite of one of Matthias’s slides into the let form with uncurrying, but it really opened up my eyes.

Acknowledgements #

Since our grading policy dictates that we can’t look at students’ name, I don’t know who this student is. Regardless, I would like to thank them for this incredible insight. I also would like to thank Shriram and Matthias for their teaching. Lastly, I thank James Wilcox for his various suggestions for this post.