Existential Types in Practical Terms

It turns out that pattern matching on Python-style tuples requires existential types. This is the second time I encounter them, so I figured it is a good opportunity to show how they arise in practise.

Existentials in match patterns

Let’s say you want to apply a function to all items in a tuple, and you want to do it using recursion instead of a for loop. Something like this:

def map(fun, items):
    match items:
        case Tuple():
            return Tuple()

        case Tuple(first, *rest):
            return Tuple(fun(first), *map(fun, rest))

The first pattern matches the empty tuple and the second all tuples with one or more elements.

This little example gives a language designer a lot to worry about. For instance, what types do the pattern expressions have? Both types should unify with items, but tuples are heterogeneous and can contain anything.

The reason they must unify with the scrutinee (i.e. items) is that they cannot possibly match if the scrutinee has an entirely different type. But unifying them with the scrutinee also unifies them with each other, which forces all patterns to have the exact same type, including the type parameter.

This leaves us with no choice but to instantiate Tuple with an unconstrained unification variable, so that both patterns have type Tuple[*R] in the surrounding scope. In case you haven’t seen it before, the asterisk before R makes it rest-kinded.

But what type does first have? Something has to be there if the pattern matched.

Existentials for closure environments

Closure conversion is a technique for eliminating free variables from functions by moving them to a record instead.

When performing closure conversion on a typed language, we may want the resulting form to be typed too. The record holding the free variables–the environment–will then need a type. But this can cause problems where the functions are used. Let’s look at an example:

let b = 1
let fun = (lambda a: a + 1) if condition else (lambda a: a + b)

Both sides of the if expression are functions from Int to Int, so we can safely assign fun the type Int -> Int. So far everything is great.

Now let’s perform closure conversion. We will need two environments, one for each function. The first lambda does not have any free variables, so its environment is an empty record. The other has a field for b.

record Env1()
record Env2(b: Int)

We will also need a representation for the closures. They have two parts: an environment and a function. Let’s assume this representation:

record Closure[E](env: E, code: (E, Int) -> Int)

After transformation to explicit closures the example looks like this:

let b = 1
let fun = Closure(Env1(), lambda env, a: a + 1) if condition else Closure(Env2(b), lambda env, a: a + env.b)

The free variable in the lambda to the right has been eliminated, so the compiler is free to move it to another scope. In an untyped representation our goal would have been accomplished now.

But in a typed representation we run into trouble–the if expression no longer type checks! Both branches should agree on a type, but one is a Closure[Env1] and the other a Closure[Env2].

Notice the similarity with the previous section, where allowing the precise pattern types to flow into the scrutinee would cause a conflict.

Quantification

We need a bit of theory for the rest of the discussion. Let’s first dissect the record definition above. Under the hood, it generates two definitions: a data constructor and a type sharing the same name. If the data constructor was a regular function it would have this signature:

def Closure[E](env: E, code: (E, Int) -> Int) -> Closure[E]:
    ..

The list in square brackets corresponds to universal quantification in predicate logic. If we write the type using a quantifier instead, we get:

∀E.(E, (E, Int) → Int) → Closure[E]

This notation is more powerful, because it lets us nest quantified expressions. Like this:

∀A.∀E. (E, (E, Int) → Int) → Closure[A]

The ∀A part extends all the way to the right, but ∀E only covers the parameter list, not the return type.

The result is now a parametrized type Closure[A] that does not depend on any of the arguments to the data constructor. It is completely detached from the expression that created it, because A does not appear in the parameter list.

This solves the problem in the previous section. At every call to Closure, the return type is instantiated with a new unification variable, that is not constrained by the type of env. Each branch of the if expression thus starts out with a different type, say Closure[T1] and Closure[T2]; But being unification variables, T1 and T2 can be unified. When the constraint solver encounters this equality, it just replaces one variable with the other and moves on.

When a universally quantified expression is nested like above, the inner quantifier is equivalent to an existential quantifier. In other words, the above can also be written:

∀A.∃E. (E, (E, Int) → Int) → Closure[A]

That is what makes E an existential type! It just means a type variable bound by an existential quantifier; And that, in turn, makes it opaque and unspecific.

Conclusion

Even though the examples look very different, the underlying problems turned out to be similar.

Stefan