A Functional Imperative Mix

How does one add immutability to a language in which the established idioms center on imperative for loops, without starting a cultural revolution? I have been mulling over this question and think I found a good answer.

Coexistence

Non-local mutation causes hard-to-find bugs and makes compiler analyses more difficult, but we cannot ban it outright. The original plan was that classes would create mutable objects, just like in Python, and records and sum types would be deeply immutable.

Unfortunately the “deeply immutable” part is problematic, because it splits the world in mutable and immutable islands. Records would not mix with regular lists or dictionaries, only other immutable types. Everyone would have to pick sides.

What we need is a way for mutable objects to become immutable when they are ready. You should be able to mutate an object when populating it and freeze it when done. That would kill two birds with one stone: we could continue using the same idioms for constructing objects, and keep using efficient data structures for the frozen form.

Persistence forces us to change

First, let me explain why I do not think persistent data structures a la Clojure is the solution. There are two reasons.

The first is that it requires adopting a different and more functional style. This function:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return res

would become:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res = res + [num]
	
    return res

The difference is just one line, but it’s very significant. Instead of modifying the object, we create a new one every time we want to add something. In this example we don’t actually gain anything, but the point is that by creating a copy instead we do not risk affecting other objects that may reference it. With persistent data structures this style is relatively efficient, but not as fast as direct mutation.

Some readers may remark that the second snippet is not functional either, since there is an assignment in the loop body. To that I reply that assignment is unproblematic as long as it is local to a function. It is “spooky action at a distance” we want to eliminate.

The second reason persistent data structures are not the answer is efficiency.

Locking it up in a monad

Haskell is famous for being purely functional, but despite this you can actually use imperative data structures using state monads.

Similar to the Python example above, a local object is created, mutated through various steps, and returned when done. The state transformer makes sure the mutation cannot be observed from the outside as the result is being constructed, and freezes the object before returning it. Since the object cannot be seen from the outside, we can freeze it without creating a copy.

The downside is you have to accept inversion of control. The function that builds up the result is no longer driving the process, instead it gets called by the state transformer monad step by step.

This indirect style is very foreign to Python for this type of thing, so we have to keep looking.

Scoped mutation

The Javascript library Immer achieves a similar result without monads. A function called produce creates a new object to hold the result, and passes it to another function that is responsible for setting it up. Since the object was just allocated, it is safe to mutate it. It is then frozen before being passed on.

import {produce} from "immer"

const nextState = produce(baseState, draft => {
    draft[1].done = true
    draft.push({title: "Tweet about it"})
})

It is a great fit for Javascript, but since Python’s lambda syntax is restricted to single-expression bodies, it cannot be ported directly.

Freezing regions

Another idea is to partition the heap into regions, isolated groups of objects. If the object is constructed in a new region, the resulting object graph will be self-contained, and we can avoid copying it when we return the frozen result.

What makes it really interesting is that a recent paper suggested adding regions to Python itself. They focus on concurrency, but it looks like it should be straightforward to support Immer-style object construction using the Region class in the paper. I imagine it would look something like this:

def positive(numbers):
    region = Region()
    region.res = []
	
    for num in numbers:
        if num >= 0:
            region.res.append(num)
	
    freeze(region)
	
    return region.res

Freezing with types

The region design proposed for Python is hampered by their constraints, in particular dynamic typing. But Newton is statically typed, which is an advantage we should maximize.

I think we may actually be able to reach the goal without any new syntax at all. If we are lucky, the construct for building up values could just be plain functions returning frozen objects.

def positive(numbers):
    let res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return frozen(res)

For this to be safe and efficient, the type system must be able to prove that res can only escape through the return statement, that there are no other references to it.

I have not worked out the details yet, but it seems uniqueness typing and regions would fit the bill. The folks at Jane Street published an interesting paper on retrofitting oCaml with uniqueness and regions using modes, so that is probably where I should start.

Conclusion

In many situations, mutability is only needed during the construction of an object, after which it does not need to be modified again. By exploiting this phase distinction we can have deeply immutable data types without splitting the language in two.

Stefan