badd10de.dev

BDL: Part 5 (Garbage collection and tail-call optimizations)


Introduction

In the previous article we implemented lambdas and closures. Now it’s time to collect the garbage, it’s a messy job but someone has to do it. I don’t have much experience writing garbage collectors (GCs), but for what I could gather a mark-and-sweep implementation is a simple way of get one off the ground. These types of GCs could also be tuned to potentially be used in different contexts, but before thinking about that we actually need to have a working implementation.

A mark-and-sweep method is very simple: on the “mark” phase, traverse all the objects linked from the root nodes. On the sweep phase all the objects that haven’t been marked can be reclaimed for further use. Normally there is a concept of a “free list” that tell us which nodes are available for further allocations. In our case we also would like to mark “environments” that are still in use, so that we can also reclaim them. For this implementation I’m going to start from the basic algorithmic idea and work on my own implementation. We will try to go one step at a time, like usual.

The garbage collector

Allocating objects

For objects, we will allocate a small heap as an array as well as a free-list of the same size. When we call the allocator function we follow this process:

  1. Check on the free-list if there are any slots available.
  2. If the list is full, trigger the GC algorithm.
  3. If the list is still full after that, grow the object and free-list heaps.
  4. Return the object in the next available slot.

Simple enough! Another candidate for using dynamic arrays. Our free list looks like this:

typedef struct FreeList {
    size_t *buf;
    size_t size;
    size_t cap;
    size_t position;
} FreeList;

The capacity of the FreeList will match the heap size for Objects. We will use the same free-list strategy for managing new Environments, since their memory also needs to be tracked. Before we continue with the implementation details, we need to briefly discuss…

The mark and sweep algorithm

One of the oldest and simplest garbage collection algorithms. We add a new boolean field to our objects and environments. Later we can encode this in a more efficient way (for example a singe bit in the type field), but we are not there yet.

typedef struct Object {
    ObjectType type;
    bool marked;
    union {
        // OBJ_TYPE_FIXNUM
        ssize_t fixnum;

        // OBJ_TYPE_STRING
        struct {
...

typedef struct Environment {
    struct Environment *parent;
    EnvEntry *buf;
    size_t size;
    size_t cap;
    bool marked;
} Environment;

The marking phase consists on traversing Objects and Environment nodes to test for “reachability”. The concept is pretty simple, but there are some problematic implications. The first question is “where do we start the traversal?”. We could use the global environment as a starting point but the fact we have “closures” means any environment could contain some state. We also need a way of protecting nodes that haven’t yet had the opportunity to be stored as a variable, for example, in the proc_list function:

Object *
proc_list(Environment *env, Object *obj) {
    if (obj == obj_nil) {
        return obj_nil;
    }
    Object *head = make_pair(eval(env, obj->car), obj_nil);
    Object *curr = head;
    obj = obj->cdr;
    while (obj != obj_nil) {
        curr->cdr = make_pair(eval(env, obj->car), obj_nil);
        curr = curr->cdr;
        obj = obj->cdr;
    }
    return head;
}

What happens if the object allocation inside the while loop triggers the garbage collector? The head object is not in an environment so it may potentially be overwritten! To address this issue, we create a stack of Object pointers that we can use to protect nodes. We call these the root nodes. Root nodes are, in addition to environments, starting points for the mark and sweep algorithm.

typedef struct RootNodes {
    Object **buf;
    size_t size;
    size_t cap;
} RootNodes;

void
push_root(Object *obj) {
    if (gc.roots.size == gc.roots.cap) {
        gc.roots.cap *= 2;
        gc.roots.buf = realloc(gc.roots.buf, gc.roots.cap * sizeof(Object *));
    }
    gc.roots.buf[gc.roots.size++] = obj;
}

Object *
pop_root(void) {
    return gc.roots.buf[gc.roots.size--];
}

We do the same thing to denote active environments that we want to protect.

typedef struct ActiveEnvs {
    Environment **buf;
    size_t size;
    size_t cap;
} ActiveEnvs;

void
push_active_env(Environment *env) {
    if (gc.active_envs.size == gc.active_envs.cap) {
        gc.active_envs.cap *= 2;
        gc.active_envs.buf = realloc(gc.active_envs.buf,
                gc.active_envs.cap * sizeof(Environment *));
    }
    gc.active_envs.buf[gc.active_envs.size++] = env;
}

Environment *
pop_active_env(void) {
    return gc.active_envs.buf[gc.active_envs.size--];
}

You can already see that we have a lot of dynamic arrays in our data and all the operations look very similar. It may be time to introduce a dedicated data structure to simplify our code, but that will have to wait until later. We have now all the necessary parts to build our memory manager:

typedef struct GC {
    RootNodes roots;
    Environments envs;
    Object *objects;
    size_t obj_cap;
    FreeList free_objects;
    FreeList free_envs;
    ActiveEnvs active_envs;
} GC;

#define GC_OBJS_CAP  1024 * 1024
#define GC_ROOTS_CAP 1024
#define GC_ACTIVE_ENVS_CAP 2
#define GC_ENVS_CAP  1024 * 4

void
init_gc(void) {
    gc = (GC){
        .objects = malloc(GC_OBJS_CAP * sizeof(Object)),
        .obj_cap = GC_OBJS_CAP,
        .envs = (Environments){
            .buf = malloc(GC_ENVS_CAP * sizeof(Environment)),
            .size = 0,
            .cap = GC_ENVS_CAP,
        },
        .roots = (RootNodes){
            .buf = malloc(GC_ROOTS_CAP * sizeof(Object*)),
            .size = 0,
            .cap = GC_ROOTS_CAP,
        },
        .free_objects = (FreeList){
            .buf = malloc(GC_OBJS_CAP * sizeof(size_t)),
            .size = GC_OBJS_CAP,
            .cap = GC_OBJS_CAP,
        },
        .free_envs = (FreeList){
            .buf = malloc(GC_ENVS_CAP * sizeof(size_t)),
            .size = GC_ENVS_CAP,
            .cap = GC_ENVS_CAP,
        },
        .active_envs = (ActiveEnvs){
            .buf = malloc(GC_ACTIVE_ENVS_CAP * sizeof(Environment*)),
            .size = 0,
            .cap = GC_ACTIVE_ENVS_CAP,
        },
    };

    // The free list stores the offset from the initial position for all
    // available slots.
    for (size_t i = 0; i < GC_OBJS_CAP; i++) {
        gc.free_objects.buf[i] = i;
    }
    for (size_t i = 0; i < GC_ENVS_CAP; i++) {
        gc.free_envs.buf[i] = i;
    }
}

The initialization function will be called from main at the start, and just sets up the initial heap allocations and points the free lists to every available slot.

Implementation

As previously mentioned, marking nodes and environments is pretty straightforward. Only the PAIR and LAMBDA types need to be treated specially:

void
mark_environment(Environment *env) {
    if (env == NULL || env->marked) {
        return;
    }
    env->marked = true;
    for (size_t i = 0; i < env->size; i++) {
        EnvEntry *entry = &env->buf[i];
        mark_obj(entry->symbol);
        mark_obj(entry->value);
    }
}

void
mark_obj(Object *obj) {
    if (obj->marked) {
        return;
    }
    obj->marked = true;
    if (obj->type == OBJ_TYPE_PAIR) {
        mark_obj(obj->car);
        mark_obj(obj->cdr);
    }
    if (obj->type == OBJ_TYPE_LAMBDA) {
        mark_obj(obj->params);
        mark_obj(obj->body);
        mark_environment(obj->env);
    }
}

void
mark_and_sweep(void) {
    // Mark.
    for (size_t i = 0; i < gc.active_envs.size; i++) {
        mark_environment(gc.active_envs.buf[i]);
    }
    for (size_t i = 0; i < gc.roots.size; i++) {
        mark_obj(gc.roots.buf[i]);
    }
...

The sweeping phase is tasked with cleaning the memory of unmarked objects and environments and recreating the free lists:

    // Reset the free list.
    gc.free_objects.position = 0;
    gc.free_objects.size = 0;
    gc.free_envs.position = 0;
    gc.free_envs.size = 0;

    // Sweep.
    for (size_t i = 0; i < gc.obj_cap; i++) {
        Object *obj = &gc.objects[i];
        if (!obj->marked) {
            // Free heap allocated memory for this object if needed.
            if (obj->type == OBJ_TYPE_SYMBOL) {
                free(obj->symbol);
                obj->symbol = NULL;
                obj->symbol_n = 0;
            } else if (obj->type == OBJ_TYPE_STRING) {
                free(obj->string);
                obj->string = NULL;
                obj->string_n = 0;
            }
            gc.free_objects.buf[gc.free_objects.size++] = i;
        }
        obj->marked = false;
    }
    for (size_t i = 0; i < gc.envs.cap; i++) {
        Environment *env = &gc.envs.buf[i];
        if (!env->marked) {
            if (env->buf != NULL) {
                free(env->buf);
                env->buf = NULL;
                env->size = 0;
                env->cap = 0;
            }
            gc.free_envs.buf[gc.free_envs.size++] = i;
        }
        env->marked = false;
    }

As simple as it is, the biggest burden is to check our entire code to ensure every time we allocate we are protected against the GC removing temporary objects. Even our parser is affected, since after all, we are allocating memory there. To avoid issues during the initial AST construction, we ensure every possible return object in our parse tree is initially marked as a root node:

Object *
parse_list(Visitor *vs) {
    Token tok = peek_token(vs);
    if (tok.type == TOKEN_EOF) {
        return obj_err;
    }
    Object *root = make_pair(obj_nil, obj_nil);
    push_root(root);
    Object *next_obj = parse_tree(vs);
    if (next_obj == obj_err) {
    ...

Object *
parse_fixnum(Token tok) {
    ssize_t num = 0;
    int sign = 1;
    for (size_t i = 0; i < tok.value.n; i++) {
        char c = tok.value.start[i];
        if (c == '-') {
            sign = -1;
            continue;
        }
        num = num * 10 + (c - '0');
    }

    Object *obj = make_fixnum(num * sign);
    push_root(obj);
    return obj;
}
...

Once we have an expression tree, we can restore the previous stack and push just the root node to the protected list, and we must also not forget to release the AST once we are done with it with pop_root():

        size_t root_stack_size = gc.roots.size;
        Object *root = parse_tree(&visitor);
        gc.roots.size = root_stack_size;
        if (root == obj_err || errors_n != 0) {
            break;
        }
        push_root(root);

        Object *result = eval(global_env, root);
        if (result != obj_nil) {
            display(result);
            printf("\n");
        }
        pop_root();

Just to be sure, in our initialization function we also protect our singleton objects:

void
init(void) {
    // Initialize garbage collector.
    init_gc();

    // Initialize singletons.
    obj_nil = alloc_object(OBJ_TYPE_NIL);
    obj_true = alloc_object(OBJ_TYPE_BOOL);
    obj_false = alloc_object(OBJ_TYPE_BOOL);
    obj_err = alloc_object(OBJ_TYPE_ERR);
    obj_quote = make_symbol((StringView){"quote", 5});
    push_root(obj_nil);
    push_root(obj_true);
    push_root(obj_false);
    push_root(obj_err);
    push_root(obj_quote);
...

Luckily we don’t allocate objects in many of our primitive functions, and so we only need to pay attention to proc_list and proc_cons for now. However, there is a big problem with our garbage collector right now.

The elephant in the room

I mentioned at the beginning that if the free lists are full, we need to allocate more memory and move our data there. The issue is that we are using pointers all over our data for environments and objects, and if we realloc our memory it is likely our pointers will be invalidated. As an example, think of this snippet:

Object *
proc_equal(Environment *env, Object *obj) {
    Object *a = eval(env, obj->car);
    Object *b = eval(env, obj->cdr->car);
    return obj_eq(a, b) ? obj_true : obj_false;
}

Anything that depends on evaluation may trigger allocations, and thus the GC can kick it. So if the first evaluation returns a valid a object pointer, but the second evaluation triggers the GC and the table has to grow, the first pointer will be invalidated. Nothing may happen at first, and things may work normal, since technically the data is still there, but the memory to which a was pointing is no longer valid! This can lead to nasty bugs hidden in our code and/or random crashes.

We could address the problem by not using pointers, for example, eval could return an offset to the objects array, which in turn we can transform into a concrete type or to modify the memory atomically. This, however requires significant refactoring so our current solution is to allocate heaps that we consider “big enough” for now (~1MB for objects and 4K for environments) and just crash with and out of memory error if we find ourselves without memory:

Object *
alloc_object(ObjectType type) {
    if (gc.free_objects.size == 0) {
        mark_and_sweep();
        if (gc.free_objects.size == 0) {
            fprintf(stderr, "NO MORE OBJ MEMORY AVAILABLE WHERE IS YOUR GOD NOW MWAHAHA\n");
            dump_gc();
            exit(EXIT_FAILURE);
            // TODO: grow heap tables.
        }
    }
    size_t slot = gc.free_objects.buf[gc.free_objects.position++];
    gc.free_objects.size--;
    Object *obj = get_obj(slot);
    obj->type = type;
    obj->marked = false;
    return obj;
}

Environment *
alloc_env(void) {
    if (gc.free_envs.size == 0) {
        mark_and_sweep();
        if (gc.free_envs.size == 0) {
            fprintf(stderr, "NO MORE ENV MEMORY AVAILABLE WHERE IS YOUR GOD NOW MWAHAHA\n");
            dump_gc();
            exit(EXIT_FAILURE);
            // TODO: grow heap tables.
        }
    }
    size_t slot = gc.free_envs.buf[gc.free_envs.position++];
    gc.free_envs.size--;
    return &gc.envs.buf[slot];
}

If we start crashing, we can of course increase the size of the alloted memory, but maybe we need to think of a long term solution. Meanwhile, since we have to touch the eval function, there is one more topic we must go through.

Evaluations and tail-call-optimizations

We can’t avoid it any longer, we must talk about tail-call-optimization (TCO). Our language, like Scheme, is based on the usage of recursion to create loops and iterative procedures. Because of that we need to guarantee that our functions don’t blow up the stack. For example, in the initial implementation of lambda functions and closures, we use the env_extend function to create a new environment for E-V-E-R-Y S-I-N-G-L-E recurring call. We don’t have the luxury of leaking memory or thinking we have an infinite heap anymore.

The good news is that implementing TCOs is actually pretty simple. We use a boolean tag recursion_active to indicate that we are in the middle of an iterative procedure. Furthermore, when we create the new extended environment, we also protect it from the GC:

eval_lambda:
            args = root->cdr;
            Object *params = lambda->params;
            if (!recursion_active) {
                recursion_active = true;
                // Protect current stack.
                Environment *tmp = env_create(lambda->env);
                push_active_env(tmp);
                // Extend environment.
                env = env_extend(tmp, env);
            }
            while (params != obj_nil) {
                if (args == obj_nil) {
                    error_push((Error){
                        .type = ERR_TYPE_RUNTIME,
                        .value = ERR_NOT_ENOUGH_ARGS,
                    });
                    return obj_err;
                }
                Object *symbol = params->car;

            ...

Not much else changes until the bottom part of this case:

            root = lambda->body;
            while (root->cdr != obj_nil) {
                if (eval(env, root->car) == obj_err) {
                    return obj_err;
                };
                root = root->cdr;
            }
            root = root->car;
            goto eval_start;

Instead of finishing up with return eval(env, root->car);, we explicitly jump to the beginning of the evaluation function. Note that sometimes, a C compiler may do this optimization for you, but we absolutely need to guarantee that the TCO happens. We can’t forget to release the environment once we are done, so we change all our return statements to point to the eval_success label, which will perform this cleanup:

...
eval_success:
    if (recursion_active) {
        // Remove stack protector.
        pop_active_env();
    }
    return ret;

There is one last piece of the puzzle. For now, this will only work for infinite recursion types. We also need to make the if procedure tail recursive. We can start by just copying the function into the appropriate area, and transform our proc_if primitive procedure into a singleton. After that, we can achieve TCO with:

    if (val == proc_if) {
        Object *obj = root->cdr;
        if (obj == obj_nil || obj->cdr == obj_nil) {
            error_push((Error){
                .type = ERR_TYPE_RUNTIME,
                .value = ERR_NOT_ENOUGH_ARGS,
            });
            return obj_err;
        }
        Object *car = obj->car;
        Object *cdr = obj->cdr;
        Object *condition = eval(env, car);
        if (condition == obj_err) {
            return obj_err;
        }
        if (condition == obj_true) {
            root = cdr->car;
        } else if (cdr->cdr != obj_nil) {
            root = cdr->cdr->car;
        } else {
            return obj_nil;
        }
        goto eval_start;
    }

Not too bad right? Don’t be afraid of the much maligned goto's, if used judiciously they can greatly help to avoid code repetition and allow for some nifty tricks like this one here. There are a couple of other minor changes here and there, but that is pretty much all we need to remark. Note that we may have to apply the same treatment to other tail-recursive functions (for example, cond or let), but this hopefully sets the ground for those additions.

Conclusion

This was a big milestone in the project! Yes, we still didn’t fully addressed some issues with the garbage collector, but at least on v0.6 all the examples and tests run successfully, and we are not leaking any memory (at least according to AdressSanitizer, which I have used extensively when trying to debug this code). The Fibonacci function in the examples folder is actually not fully TCO, but can be used to test if the GC its doing its job. Here is a different tail optimized function for testing purposes, just a long loop, which hopefully should run to completion:

(fun recur (n)
    (if (<= n 1)
        'ok
        (recur (- n 1))))

(recur 2000000)

I can’t begin to tell you how many hours I wasted trying to discover memory leaks due to some uninitialized variable. Moving forward we should still be cautious about memory, specially when it comes to protecting root nodes. For our next trick, we will probably be doing some refactoring, code cleanup and maybe implementing a data structure or two. See you around!