badd10de.dev

BDL: Part 3 (Evaluation and environments)


Introduction

With the code from the previous issue of this series we now have access to a system that generates AST trees from a series of tokens for each of the expressions in our language. We have a recursive algorithm for displaying the tree, but how do we go about “evaluating” it? Well, primitive types such as fixnums, booleans, nil or strings just evaluate to themselves. A symbol evaluates by performing a lookup to try to find the object associated with it. We will discuss this more in the next section.

A pair list evaluates to a procedure call, where the first element is the symbol name corresponding to the procedure name, and the rest are consider the parameters of that procedure. This could be done with a recursive call to eval but we want procedure calls to be tail-call optimized so we need to account for this, otherwise recursive functions will explode the stack.

Here is the basis of the eval function, note that the reported runtime error doesn’t have access to a line/col count, making debugging more complicated that it has to be. We will address that in the future, but for now let’s focus on the evaluation:

Object *
eval(Object *root) {
    switch (root->type) {
        case OBJ_TYPE_FIXNUM:
        case OBJ_TYPE_BOOL:
        case OBJ_TYPE_NIL:
        case OBJ_TYPE_STRING: {
            return root;
        } break;
        case OBJ_TYPE_SYMBOL: {
            // TODO: Implement...
        } break;
        case OBJ_TYPE_PAIR: {
            // TODO: Implement...
        } break;
        default: {
            break;
        } break;
    }

    error_push((Error){
        .type = ERR_TYPE_RUNTIME,
        .value = ERR_UNKNOWN_OBJ_TYPE,
        .line = 0,
        .col = 0,
    });
    return obj_err;
}

We adjust the process_source function so that we can start testing our evaluator. As the ominous FIXME comment mentions, we may be leaking memory, but we will have to worry about that in the future.

    while (has_next_token(&visitor) && peek_token(&visitor).type != TOKEN_EOF) {
        Object *root = parse_tree(&visitor);
        if (root == obj_err || errors_n != 0) {
            free_objects(root);
            break;
        }

        // FIXME: Not freeing result or intermediate objects, can leak memory.
        Object *result = eval(root);
        printf("AST: ");
        display(root);
        printf("\n");
        printf("EVAL: ");
        display(result);
        printf("\n");
        free_objects(root);
    }

You can create a small test.bdl program to see if things are working properly.

1
"test"
true
false
()
a
'test
(+ 1 2 (+ 3 4))

Just run it with ./build/bdl test.bdl and you should be getting the following result so far:

AST: 1
EVAL: 1
AST: "test"
EVAL: "test"
AST: true
EVAL: true
AST: false
EVAL: false
AST: ()
EVAL: ()
AST: :a
EVAL: #{error}
test.bdl: error: can't eval unknown object type

When it encounters a symbol, it returns an error, as expected. I used the same procedure explained earlier for error reporting so from now on I’ll skip the explanation of how to add new error types. Let’s start exploring how to deal with symbols.

Symbols, scoping and environments

As previously mentioned, a symbol will evaluate as a lookup for an associated object. But where do we look things? Well first we need to talk about scopes. Scheme is a lexically scoped language and the context of a symbol depends on the scope we are in. For example:

(define a 10)
(define b 60)

(let ((a 20))
    (display b)
    (display a)
    (set! a 42)
    (display a)
    (newline))

(display a)
(newline)

In a correct implementation this should print:

60
20
42
10

Note how the variable in the global scope is not affected by the let binding or the set! procedure. We can have an arbitrary number of nested scopes, with symbols “shadowing” each other so we need a way of looking up a symbol in the current scope, and if we don’t find it go to the previous one and so on.

We can implement this in C via chained environments. We will use a simple approach, where every environment is a dynamic array of Object to Object pairs. To look up a symbol we linearly traverse the entire array to find a matching symbol and if we don’t find it in this scope we repeat the process in the parent environment until we reach the root. Normally we could use a HashTable data structure for this kind of thing, since in theory its a O(1) lookup instead of O(n). However, for small arrays a hash table might actually be slower, since we may incur more cache misses. In any case, a dynamic array is easier to implement and understand for now, we can always revisit this later as an optimization. Here is how the Environment structure looks like:

typedef struct EnvEntry {
    Object *symbol;
    Object *value;
} EnvEntry;

typedef struct Environment {
    struct Environment *parent;
    EnvEntry *buf;
    size_t size;
    size_t cap;
} Environment;

We will consider an Environment to be the top level if parent == NULL. Now we just need to create a couple of helper functions to create and expand and environment:

static Environment *global_env;

#define ENV_BUF_CAP 8

Environment *
env_create(Environment *parent) {
    Environment *env = malloc(sizeof(Environment));
    env->parent = parent;
    env->buf = NULL;
    env->size = 0;
    env->cap = ENV_BUF_CAP;
    return env;
}

void
env_add_symbol(Environment *env, Object *symbol, Object *value) {
    if (symbol->type != OBJ_TYPE_SYMBOL) {
        error_push((Error){
            .type = ERR_TYPE_RUNTIME,
            .value = ERR_NOT_A_SYMBOL,
            .line = 0,
            .col = 0,
        });
        return;
    }
    if (env->buf == NULL) {
        env->size = 0;
        env->cap = ENV_BUF_CAP;
        env->buf = malloc(env->cap * sizeof(EnvEntry));
    } else if (env->size == env->cap) {
        env->cap *= 2;
        env->buf = realloc(env->buf, env->cap * sizeof(EnvEntry));
    }
    env->buf[env->size++] = (EnvEntry){symbol, value};
}

We create a global environment that we will be initializing on the init function. We add the :a symbol to the global environment for testing purposes.

    // Global environment.
    global_env = env_create(NULL);
    env_add_symbol(global_env, MAKE_SYM("a"), make_fixnum(4)); // NOTE: for testing only

Here we are using the MAKE_SYM macro, which is a simple helper defined as follows:

#define MAKE_SYM(STR) make_symbol((StringView){(STR), sizeof(STR) - 1})

Our lookup function is very simple:

Object *
env_lookup(Environment *env, Object *symbol) {
    while (env != NULL) {
        for (size_t i = 0; i < env->size; i++) {
            EnvEntry entry = env->buf[i];
            if (obj_eq(symbol, entry.symbol)) {
                return entry.value;
            }
        }
        env = env->parent;
    }
    return obj_err;
}

It depends on the following equality function. For simplicity we are not yet handling the OBJ_TYPE_PAIR:

bool
obj_eq(Object *a, Object* b) {
    if (a->type != b->type) {
        return false;
    }
    switch (a->type) {
        case OBJ_TYPE_FIXNUM: {
            return a->fixnum == b->fixnum;
        } break;
        case OBJ_TYPE_STRING: {
            if (a->string_n != b->string_n) {
                return false;
            }
            for (size_t i = 0; i < a->string_n; i++) {
                if (a->string[i] != b->string[i]) {
                    return false;
                }
            }
        } break;
        case OBJ_TYPE_SYMBOL: {
            if (a->symbol_n != b->symbol_n) {
                return false;
            }
            for (size_t i = 0; i < a->symbol_n; i++) {
                if (a->symbol[i] != b->symbol[i]) {
                    return false;
                }
            }
        } break;
        case OBJ_TYPE_PAIR: {
            // TODO: needs evaluation of parameters...
        } break;
        default: {
            return a == b;
        } break;
    }
    return true;
}

Almost there! We have to modify the eval function to accept an environment and add the symbol lookup.

Object *
eval(Environment* env, Object *root) {
        ...

        case OBJ_TYPE_SYMBOL: {
            Object *val = env_lookup(env, root);
            if (val == obj_err) {
                error_push((Error){
                    .type = ERR_TYPE_RUNTIME,
                    .value = ERR_SYMBOL_NOT_FOUND,
                    .line = 0,
                    .col = 0,
                });
                return obj_err;
            }
            return val;
        } break;

        ...

If we test our previous script with this version, we should get something like this:

AST: 1
EVAL: 1
AST: "test"
EVAL: "test"
AST: true
EVAL: true
AST: false
EVAL: false
AST: ()
EVAL: ()
AST: :a
EVAL: 4
AST: (:quote :test)
EVAL: #{error}
test.bdl: error: symbol not found

As you can see the evaluator is able to resolve the :a symbol into the hardcoded value 4.

Procedure evaluation

Now that we have a way of resolving symbol lookups, we can finally implement the execution of native primitive procedures. If the object we are currently trying to evaluate is a list, we try to resolve the symbol for the first element. If the resulting value is a primitive procedure, we call it with the tail of the list as parameters.

    case OBJ_TYPE_PAIR: {
        if (root->car->type == OBJ_TYPE_SYMBOL) {
            Object *val = env_lookup(env, root->car);
            if (val == obj_err) {
                error_push((Error){
                    .type = ERR_TYPE_RUNTIME,
                    .value = ERR_SYMBOL_NOT_FOUND,
                    .line = 0,
                    .col = 0,
                });
                return obj_err;
            }
            if (val->type == OBJ_TYPE_PROCEDURE) {
                return val->proc(env, root->cdr);
            }
            error_push((Error){
                .type = ERR_TYPE_RUNTIME,
                .value = ERR_OBJ_NOT_CALLABLE,
                .line = 0,
                .col = 0,
            });
            return obj_err;
        }
    } break;

Note that native procedures don’t know about an environment, so we need to modify our Object type to consider this.

typedef struct Object {
    ObjectType type;
    union {
        ...

        // OBJ_TYPE_PROCEDURE
        struct Object *(*proc)(struct Environment *env, struct Object *args);
    };
} Object;

We are finally ready to add our first primitive procedure. The simplest thing we can do is to resolve the quote by returning (but not evaluating) their arguments.

Object *
proc_quote(Environment *env, Object *obj) {
    (void)env;
    return obj->car;
}

Let’s add this primitive to the global environment and a value for the test symbol to ensure we are not evaluating it:

    // Global environment.
    global_env = env_create(NULL);
    env_add_symbol(global_env, MAKE_SYM("a"), make_fixnum(4)); // NOTE: for testing only
    env_add_symbol(global_env, MAKE_SYM("quote"), make_procedure(proc_quote));
    env_add_symbol(global_env, MAKE_SYM("test"), make_fixnum(6));

Running our test script we now obtain the following output:

...
AST: (:quote :test)
EVAL: :test
AST: (:+ 1 2 (:+ 3 4))
EVAL: #{error}
test.bdl: error: symbol not found

Hurray! It’s working! Now we can start adding a lot of primitive functions, such as arithmetic operations. Let’s test this by adding the proc_sum procedure and binding it to the symbol + on the global scope as before:

Object *
proc_sum(Environment *env, Object *obj) {
    // First argument.
    if (obj == obj_nil) {
        error_push((Error){
            .type = ERR_TYPE_RUNTIME,
            .value = ERR_NOT_ENOUGH_ARGS,
        });
        return obj_err;
    }
    Object *car = eval(env, obj->car);
    if (car == obj_err) {
        return obj_err;
    }
    if (car->type != OBJ_TYPE_FIXNUM) {
        error_push((Error){
            .type = ERR_TYPE_RUNTIME,
            .value = ERR_WRONG_ARG_TYPE,
        });
        return obj_err;
    }

    // Traverse the list.
    obj = obj->cdr;
    ssize_t tot = car->fixnum;
    while (obj->type == OBJ_TYPE_PAIR) {
        Object *car = eval(env, obj->car);
        if (car == obj_err) {
            return obj_err;
        }
        if (car->type != OBJ_TYPE_FIXNUM) {
            error_push((Error){
                .type = ERR_TYPE_RUNTIME,
                .value = ERR_WRONG_ARG_TYPE,
            });
            return obj_err;
        }
        tot += car->fixnum;
        obj = obj->cdr;
    }
    return make_fixnum(tot);
}

...
    env_add_symbol(global_env, MAKE_SYM("+"), make_procedure(proc_sum));
...

Exciting! If everything went well our little test script should be working properly until the end resulting in the final output of:

AST: 1
EVAL: 1
AST: "test"
EVAL: "test"
AST: true
EVAL: true
AST: false
EVAL: false
AST: ()
EVAL: ()
AST: :a
EVAL: 4
AST: (:quote :test)
EVAL: :test
AST: (:+ 1 2 (:+ 3 4))
EVAL: 10

Conclusion

We now have the power to evaluate expressions and of running primitive procedures. If you are following along so far, you can try to implement other primitives as an exercise. In the interest of brevity, I added a number of primitive procedures in the src/bootstrap/primitives.c file, which you can find on the v0.4 tag version of the source code. At the end of this process you should be able to run make tests successfully. This brings us to parity with the main branch of the interpreter, so we are threading in new ground for me.

We are potentially leaking memory, but at this point I’m ok with this, since we will be implementing a garbage collector in future issues. We are still not done with environments, however, since we still need to allow the user to define variables, functions and be able to change their values with set!. Very soon we will have access to lambda functions and our language will be Turing complete. We will talk more about this and more in the next installment of this series. See you soon!