Compiling Pattern Matching

I haven’t had much time for Newton since the last blog post, but I was able to get past something I have been working on for a while recently. I can now lower match statements to decision trees.

Case patterns in Newton and similar languages can be nested (i.e. contain other patterns) and this has to be removed before we generate target code, since the target is usually a low-level language or machine code. For instance, one of the targets I plan to support is C, and switch statements in C can only compare atomic values.

Flattening the patterns sounds easy but there is actually a great deal of nuance and complexity. For instance, you may need to duplicate branches as you go along, but you don’t want to inflate the code size. You also want to avoid testing the same condition more than once, handle match failure, default cases, and generate bindings.

Here is an example with integer patterns inside constructor patterns:

def test2(pair):
    match pair:
        case Pair(1, 0):
            return 0
        case Pair(2, n):
            return n
        case anything:
            return 2

The compiler can choose the order in which individual sub patterns are checked freely as long as the top-most matching case is selected in the end. The order affects the size of the resulting decision tree, but unfortunately finding the optimal order is not tractable so we have to rely on heuristics.

Here is the resulting tree when the fields are compared right to left, i.e. the second field of the Pair before the first.

def test2(pair):
    match pair:
        case Pair(first, second):
            let n = second

            match second:
                case 0:
                    match first:
                        case 1:
                            return 0

                        case 2:
                            return n

                        case _:
                            return 2
                case _:
                    match first:
                        case 2:
                            return n

                        case _:
                            return 2

        case _:
            return 2

Notice how every comparison is now a single constructor or integer literal, and that the Pair constructor is only tested once. The nesting has been replaced by wildcards and explicit control flow.

Before I started I looked around for an approach that is simple to implement but still generates reasonable code. I found a paper by David MacQueen that has a pragmatic focus, accompanied by a talk.

My implementation takes a lot of inspiration from the talk but ended up quite different. For instance, before I generate “and-or trees” I group all patterns by the outermost data constructor and use a matrix to represent the sub patterns inside. Many other authors use matricies for pattern compilation, so I’m not claiming novelty.

Stefan