Existential Types in Practical Terms

It turns out that pattern matching on Python-style tuples requires existential types. This is the second time I encounter them, so I figured it is a good opportunity to show how they arise in practise.

Existentials in match patterns

Let’s say you want to apply a function to all items in a tuple, and you want to do it using recursion instead of a for loop. Something like this:

def map(fun, items):
    match items:
        case Tuple():
            return Tuple()

        case Tuple(first, *rest):
            return Tuple(fun(first), *map(fun, rest))

The first pattern matches the empty tuple and the second all tuples with one or more elements.

This little example gives a language designer a lot to worry about. For instance, what types do the pattern expressions have? Both types should unify with items, but they clearly have different size.

The reason they must unify with the scrutinee (i.e. items) is that they cannot possibly match if the scrutinee has an entirely different type. But unifying them with the scrutinee also unifies them with each other, which forces all patterns to have the exact same type, including the type parameter. This leaves us with no choice but to instantiate them with fresh unification variables. Let’s say the first pattern is instantiated to Tuple[*α], the other to Tuple[*β], and item has type Tuple[*γ]. All three types α, β, and γ will then be unified.

The point is that by using α and β instead of the actual pattern types we are able to avoid the conflict. The asterisk before the variables just makes them rest-kinded.

But what type does first have? Something has to be there if the pattern matched.

Existentials for closure environments

Another example is a compiler transformation known as closure conversion. It is used for eliminating free variables from nested functions, so that they can be moved to the top level. The variables are replaced with fields in a record instead.

When performing closure conversion on a typed language, we may want the resulting form to be typed too. The record holding the free variables–usually called the environment–will then need a type. But this can cause problems where the functions are used. Let’s look at an example:

let b = 1
let fun = (lambda a: a + 1) if condition else (lambda a: a + b)

Both sides of the if expression are functions from Int to Int, so we can safely assign fun the type Int -> Int. So far everything is great.

Now let’s perform closure conversion by hand. We will need two environments, one for each function. The first lambda does not have any free variables, so its environment is an empty record. The other has a field for b.

record Env1()
record Env2(b: Int)

We will also need a representation for the closures. They have two parts: an environment and a function. Let’s assume this representation:

record Closure[E](env: E, code: (E, Int) -> Int)

After transformation to explicit closures the example looks like this:

let b = 1
let fun = Closure(Env1(), lambda env, a: a + 1) if condition else Closure(Env2(b), lambda env, a: a + env.b)

The free variable in the lambda to the right has been eliminated, so the compiler can move it to another scope. In an untyped representation our goal would have been accomplished now.

But in a typed representation we run into trouble–the if expression no longer type checks! Both branches should agree on a type, but one is a Closure[Env1] and the other a Closure[Env2].

Notice the similarity with the previous section, where allowing the precise pattern types to flow into the scrutinee would cause a conflict.

Quantification

We need a bit of theory for the rest of the discussion. Let’s first dissect the record definition above. Under the hood, it generates two definitions: a data constructor and a type sharing the same name. If the data constructor was a regular function it would have this signature:

def Closure[E](env: E, code: (E, Int) -> Int) -> Closure[E]:
    ..

The list in square brackets corresponds to universal quantification in predicate logic. If we write the type using a quantifier instead, we get:

∀E.(E, (E, Int) → Int) → Closure[E]

This notation is more powerful, because it lets us nest quantified expressions:

∀A.(∀E.(E, (E, Int) → Int) → Closure[A])

When a universally quantified expression is nested like this, the inner quantifier is equivalent to an existential quantifier. In other words, the above could also be written like this, without changing the meaning:

∀A.(∃E.(E, (E, Int) → Int) → Closure[A])

The inner type parameter is now clearly marked as existential. Existentials represent some type, but we don’t know which. Universally quantified types, in contrast, can be replaced by any type.

Above, I added a second type parameter A and changed the result type to Closure[A]. Note that the result no longer depends on any of the arguments to the data constructor. It is completely detached from the expression that created it, because A does not appear in the parameter list.

At first glance, this appears to solve the problem in the previous section. Every call to Closure instantiates the return type with a new unification variable, that is not constrained by the actual environment type. Each branch of the if expression thus starts out with a different type, say Closure[α] and Closure[β]; But being unification variables, α and β can be unified. When the constraint solver encounters this equality, it just replaces one variable with the other and moves on. No conflict arises.

There is a problem with this solution, though. Being unification variables, α and β may unify with other types too, and that could cause an arbitrary environment to propagate to the Closure parameter. Instantiating the return type with a unification variable was the right choice for example one, but it is not for example two.

For the Closure we actually want A to be existential, not universal. Existentials are instantiated to skolem variables instead, to prevent unification. Since the branches will no longer unify if the skolems are different, the compiler needs to have a separate analysis that instantiates the parameter to the same skolem for both closures (but not elsewhere).

Conclusion

Even though the examples look very different, the underlying problems turned out to be similar.

Stefan