Was writing the compiler in Python a good idea?
It’s been more than three years now since I started writing a compiler in Python. Looking back, did I choose the right language?
Main reasons
I already knew Python well and was productive in it, so that was one reason. It also lowered the threshold to eventually self-hosting the compiler, since the languages are so similar. Newton is not complete or mature enough for that yet, but the compiler is written in a style that I hope will be easy to port to Newton one day. Being a compiler, it’s using algebraic data types and pattern matching all over the place.
Python doesn’t have algebraic data types, but it does have pattern matching, and that let’s me simulate it to some extent. Here is an artifical example to give you the idea:
# All variant types inherit from the ADT class, which implements __repr__, __eq__, and __hash__
# define a variant type Expr
class Expr(ADT):
pass
# define an Expr case named Add
class Add(Expr):
__match_args__ = ("left", "right")
def __init__(self, left, right):
self.left = left
self.right = right
class Sub(Expr):
__match_args__ = ("left", "right")
def __init__(self, left, right):
self.left = left
self.right = right
def const_eval(expr):
match expr:
case Add(left, LitInt(0)):
return left
case Add(LitInt(0), right):
return right
# other valid patterns here
case _:
raise Exception(f'no case matching {expr}')I believe the functional style minimizes the time wasted on debugging. Pattern matching is not a good example of what I mean though. It’s more important to restrict mutation.
The monad mistake
I tried pushing the functional style quite far, actually. Too far, it turned out. I experimented with a monadic style of Python, and it failed miserably. It was supposed to help me move faster, but instead it slowed me down and made the code worse.
When I started out I had never used monads before, so part of the problem was lack of experience. I still haven’t used Haskell actually, I just like admiring it from a distance. But I did learn a lot. I learned that monads need “do notation” to be practical, and that do notation doesn’t fit in a language with single-line lambdas.
Other reasons
There are also less personal reasons that make Python a good language for prototyping a compiler. It’s just a very productive language, with rapid feedback, reliable stack traces, and a good standard library. The last point is not only useful for the compiler itself–you can also delay writing your own standard lib by wrapping Python’s data types instead.
I still cannot think of a better alternative, not for my goals. The one thing that bugs me a bit is I can only distribute it in source form, without a third party tool like Nuitka. But once the compiler is ready to be distributed, it will hopefully be ready to be self hosted as well. To be able to produce binaries I will also need a suitable backend, such as C or WASM, so it’s a bit of work though.
Verdict
Knowing what I know now, I would 100% choose Python again.
So here is my advice to people who are trying to decide:
- Pick the language you are most familiar with
- If you are planning to self host the compiler one day, pick an implementation language that resembles the source language
- Don’t do monads in Python
Stefan