Handwritten Generators

April 27, 2024

Before I start generating C, I want to be able to loop over generator expressions. Having to iterate using C-style loops would be such an anticlimax. But there is a lot of work before I get there and I want to validate the approach first.

The destination

Since C does not have generators, they will have to be transformed into something else first. Each generator function will be chopped up into two new functions wherever a yield statement appears–One function returning before the yield statement, and the other starting at the next statement.

To resume the generator at some yield point, you invoke the function starting at that point. This means we will need a mechanism for remembering where it should resume, and the current contents of the local variables.

The shortcut

I will perform the transformation by hand first to make sure I understand what needs to be done. This will let me loop over generators before I have implemented them in the compiler. Sounds paradoxical, but it works because they translate into regular functions and records.

Generators as closures

The range function is a good first choice because it is so simple. Here is what a normal definition may look like in Python:

def range(start, stop):
    while start < stop:
        yield start
        start += 1

Using closures it may look like this instead:

Yield = namedtuple('Yield', 'item next')

def range(start, stop):
    def generator():
        if start < stop:
            return Just(Yield(start, range(start + 1, stop)))
        else:
            return Nothing()

    return generator

In this version, the range function returns a nested function with free variables start and stop, i.e. a closure. This function is the starting point of the generator. When called, it returns a Yield value wrapped in an option type.

The item field holds the yielded value and the next field a new generator that when invoked returns the next element.

Testing it

Let’s try it out. Here is how to loop over it:

gen = range(0, 3)

while True:
    match gen():
        case Just(Yield(item, next)):
            print(item)
            gen = next

        case Nothing():
            break

The output is 0, 1, 2. Yay!

Manual closure conversion

C does not have closures either though, so we have not gained much yet. We will need to eliminate them too. There are many ways to do that, but the technique I’m interested in for this post is called closure conversion. It entails moving the free variables to a record instead and passing it to the function.

Let’s use Newton this time, so we can see the types. Here is the same example after manual closure conversion:

record Yield[T](item: T, next: Closure[T])
record Closure[T](code: (Closure[T]) -> Maybe[Yield[T]], start: T, stop: T)

def range(start, stop):
    return Closure(generator, start, stop)

def generator(env):
    if env.start < env.stop:
        return Just(Yield(env.start, Closure(generator, env.start + 1, env.stop)))
    else:
        return Nothing

The start and stop variables have moved to a record and the function is no longer nested. We can loop over it the same way as before, except the call site has to be changed to gen.code(gen)

let gen = range(0, 3)

while True:
    match gen.code(gen):
        case Just(Yield(item, next)):
            print(item)
            gen = next

        case Nothing:
            break

Now everything has a direct correspondence in C! Well everything except the nested pattern in the case clause, but I covered that in an earlier blog post.

Stefan