A Functional Imperative Mix

How does one encourage the use of immutable data structures when the established idioms center on imperative for loops, without creating a rift in the language? I have been mulling over this question and think I found a good answer.

Coexistence

Non-local mutation makes compiler analyses more difficult and can cause hard-to-find bugs, but we cannot ban it outright. So what do we do?

Newton was always going to have mutation, just like Python, but it should have first-class support for immutable data types as well. The idea was that objects created from classes would be mutable, while records and sum types would be deeply immutable. Unfortunately the “deeply immutable” part is problematic, because it splits the language in mutable and immutable islands. Records would not be able to contain regular lists or dictionaries, only other immutable types.

What we need is a way for mutable objects to become immutable when they are ready, so that they can be placed in records and other immutable objects. You should be able to mutate an object when populating it and freeze it when done. That would kill two birds with one stone: we could continue to use the same idioms for constructing objects, and keep using efficient data structures for the frozen form.

Persistence forces us to change

First, let me explain why I do not think persistent data structures a la Clojure is the solution. There are two reasons.

The first is that it requires adopting a different and more functional style. Take this function:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return res

Using immutable data structures the most direct translation looks like this:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res = res + [num]
	
    return res

The difference is just one line, but it’s very significant. Instead of modifying the object, we create a new one every time we want to add something. In this example we don’t actually gain anything, but the point is that by creating a copy instead we do not risk affecting other objects that may have a reference to it. With persistent data structures this style is relatively efficient, but still not as fast as direct mutation.

Efficiency is the other reason persistent data structures is not the answer.

Locked up in a monad

Haskell is famous for being purely functional, but despite this you can actually use imperative data structures using the ST monad.

Similar to the Python example above, a local object can be created, mutated through various steps, and returned when done. The monad makes sure the object can only be accessed safely, and freezes it before it is released. Since it cannot have external aliases, it can be frozen directly without making a copy.

The downside is you have to accept inversion of control. The function that builds up the result is no longer driving the process, instead it gets called by the monad step by step. This style would be too foreign to Pythonistas, so monads are not the solution either.

Scoped mutation

The Javascript library Immer achieves a similar result without monads. A function called produce creates a new object to hold the result, and passes it to another function that is responsible for setting it up. We know there are no other references to the object since is was just allocated, so it is safe to mutate it. The produce function then marks it as frozen before returning it.

import {produce} from "immer"

const nextState = produce(baseState, draft => {
    draft[1].done = true
    draft.push({title: "Tweet about it"})
})

It is a great fit for Javascript, but since Python’s lambda syntax is restricted to single-expression bodies, it cannot be ported directly.

Freezing regions

Another idea is to partition the heap into regions, separate groups of objects. If the object is constructed in a new region, the resulting object graph will be self-contained, and we can avoid copying it when we return the frozen result.

What makes it really interesting is that a recent paper suggested adding regions to Python itself. The paper is about concurrency, but it looks like it should be straightforward to support Immer-style object construction using the Region class in the paper. I imagine it would look something like this:

def positive(numbers):
    region = Region()
    region.res = []
	
    for num in numbers:
        if num >= 0:
            region.res.append(num)
	
    freeze(region)
	
    return region.res

This solution is nice and pythonic, just a bit verbose. The biggest drawback is it can fail at runtime.

Freezing with types

For a region design to be acceptable for Python, it has to work with dynamic typing and the CPython interpreter. But Newton is statically typed, which I think will enable a much better solution. Maybe we can even avoid adding new syntax altogether. Instead we just build up the result the usual way using mutation, and call a function called frozen on it when done.

def positive(numbers):
    let res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return frozen(res)

For this to be safe and efficient (frozen should not create a copy, just return the same object with a different type), the type system must be able to prove that res can only escape through the return statement, that there are no other references to it; And it must do so without placing onerous restrictions on how res can be used. For instance, we still want to be able to pass it to other functions and methods, such as append in the example above.

I do not have a complete understanding of the theory yet, but it seems uniqueness typing and regions would fit the bill. The folks at Jane Street published an interesting paper on retrofitting OCaml with uniqueness and regions using modes, so that is probably where I should start reading.

Conclusion

In many situations, mutability is only needed during the construction of an object, after which it does not need to be modified again. By exploiting this phase distinction we can have deeply immutable data types without splitting the language in two, and with zero loss of efficiency.

Stefan