The four layers of Python

One way to think about Python is that it’s built up from progressively larger sub languages, like the layers of an onion. Turns out it is still useful after we peel off the top layer, maybe even more useful for some purposes.

Four layers

Viewed this way, Python has four layers wrapped around a core of built-in primitives:

  • The expression layer
  • The statement layer
  • The definition layer
  • The library layer

The expression layer consists of literals, arithmetic, function calls, comprehensions, and so on. Everyone is using this subset.

The statement layer adds assigments and control flow, such as conditionals, loops, and exception handling.

The definition layer lets you define new functions and classes, and overload operators.

The library layer is mostly used for writing packages of reusable functionality. It lets you do metaprogramming, load foreign functions, and customise how classes are created. I would put reflection in this category too.

Assignments are a bit special. I would put assignments that bind variables in layer 1, and those that modify a existing variable in the statement layer. But assignments can also add methods to classes and be used for other very dynamic behavior. I would classify this as layer four.

Each layer is useful

Most Python programmers only use the first three layers, and it is interesting to note that they are useful on their own, from the inside out.

For instance, the expression subset could be used for a configuration language, or as the formula language of a spreadsheet. You don’t need the other layers for this purpose.

Add the statement layer and you can do simple scripting, as an alternative to shell scripts or batch files. You can stay in the inner two layers and still do meaningful work using only core functions and data types.

The library layer

Definitions from outer layers appear in inner layers as identifiers, whose implementation can be replaced as long as they are compatible. The expression layer doesn’t care if a function was defined in the third layer, generated by a decorator in the fourth layer, or was built into the core.

This goes beyond just functions. In most cases it does not matter if decorators are a load-time construct, like today, or compile-time macros.

This means we may be able to peel off the outer layer with limited impact on the layers below. The value of existing skills and source code could be preserved.

This matters, because the library layer is where most of the problems with Python appear, from a compiler writer’s point of view.

Not all languages

You could argue that all languages are layered like this, but I don’t think that’s true. The expression subset of C, for instance, is too bare-bones to be useful on its own. You need collection literals and comprehensions to be able to do more than just arithmetic. There must be a written representation for composite values so that they can be typed in and displayed.

The various subsets could also be in competition with each other, instead of layered. Scala comes to mind.

Python without definitions

Leaving out the top layers has more benefits than just limiting the scope for the compiler engineer. Not giving the user any means of extension lets the compiler make stronger assumptions and rule out possibilities. For instance, field locations would be a simple base + offset calculation if fields could not be added at runtime.

Prior art

There have been many projects that re-implement the core, for instance Iron Python and Nuitka. The best example of peeling off a layer that I am aware of is Starlark, a build language with Python syntax. There is also GD Script, but they seem to differ from Python a lot more, and in every layer.

Conclusion

Succeeding as a new Pythonic language looks more feasible once we realize not all layers have to be delivered together. The inner layers could be wrapped around a new core and the most dynamic means of extension omitted, and the language would still be useful and familiar.

Stefan