Type Checking Variadic Functions

I was not planning on extending the type system further before 1.0, but I ran into a problem I think warrants it. That problem is assigning an appropriate type to Python’s print function. What makes the print function difficult to analyze is that it takes an arbitrary number of arguments, and those arguments can have any type. So we have an unknown number of unknowns!

Typing a homogeneous list is much easier in comparison, because although the number of elements is unknown, there is only one type parameter.

Syntax for Variadic Functions

First of all we need a syntax for variadic functions. As usual, I will take Python as the starting point.

def foo(*args: *Ts)
def foo(*args: tuple[*Ts])

Both definitions mean the same thing in Python. The asterisk in front of the parameter name means you can pass the function an arbitrary number of arguments, accumulating in args. We will need two minor adjustments for Newton:

  • Types are always capitalized, so tuple[*Ts] becomes Tuple[*Ts]
  • Any type variables mentioned in the definition must be declared in square brackets.

Below is the corresponding function in Newton. Both definitons take an arbitrary number of arguments, packed in a Tuple:

def foo(*args)
def foo[*Ts](*args: Tuple[*Ts])

I will have to add tuples to the language as well, since Newton does not have heterogeneous lists yet.

Rows and rest parameters

Now, the question is how to classify the *Ts part of the type annotation. It clearly doesn’t represent a single type, but it’s not a term either, so what is it? The closest thing I could think of was a row type, which is a kind of type used for extensible records in Koka and other languages. Rows consist of an open-ended number of (label: type) pairs, so they contain an arbitrary number of type parameters, just like rest parameters. But unlike them, rest parameters (tuples) don’t have labels.

Kinds and kind checking

Before I clarify what rests are exactly, let’s recap briefly how Hindley Milner type systems work.

The syntax for type annotations forms a little sub language that we can call the type language. It consists of type constructors, type applications, and type variables. For instance, Pair[Int, Bool] means “apply the type constructor Pair to the type arguments Int and Bool”. In other words, Pair is a type-level function that is “called” with types as arguments, producing a composite type as result. Constants such as Int are just type functions of zero arguments.

However, not all combinations of type functions and arguments make sense. For instance, Pair takes exactly two arguments–anything else would be an error. Turns out some of the mistakes possible in the term language can occur on the type level too!

To prevent errors on the type level, we classify types into different kinds and use a kind checker to enforce the kind rules. You can think of kinds as types of types. For instance, the kind of types that describe values (such as Int) is often called * (pronounced star), and List has kind * → * meaning a function from types to types.

Kind of *Ts

Ok, back to rests again. *Ts can be passed as an argument to the Tuple type constructor, just like type constants such as Bool. But unlike Bool, it cannot be used as the return type of a function.

Clearly the *Ts fragment must participate in type inference somehow, but we need a way to categorize and mark it. As you may have guessed, the solution is to give it a distinct kind that I am calling rest, since I could not find a suitable name for rows without labels in the literature.

More precisely, *Ts is a type variable of kind rest. The compiler will check how it is used in the body of the function and infer the type that satisfies all constraints. Consider this example:

def foo(*args):
	return first(args) + second(args) + 10

Foo expects at least two arguments, both of which must be integers. It would be an error to call it with a single Int, or with two Bools. However, calling it with three integers would not break any rule, since none of the constraints implicit in the function body preclude passing more than the required number of arguments.

We encode the lack of constraints on the rest of the tuple by letting it remain a variable. After type inference foo will thus have this signature:

def foo[*Ts](*args: Tuple[Int, Int, *Ts]):

To avoid unifying rests with type variables of kind *, we add a kind field to the internal representation of type variables, and check that they are equal when solving the constraint system.

There is More

I glossed over important details above, but I hope you at least got an intuition for how the problem can be modeled. There are actually crucial problems left that I did not discuss at all in this article. Part of the reason is I am still working on it and do not yet know what the end result will be. I do think I have a pretty good grasp of the remaining challenges though, so I’m hopeful it will work out.

I found the book Typing Haskell in Haskell helpful, if you want to know more about kind systems. It is available for free online.

Stefan