Types in Plum

Why the compiler models types as strings

by Marcel Garus · 2025-5-12
available at www.marcelgarus.dev/plum-types

Plum is a programming language with structural typing: Types don't have an identity. Values of these two types can be used interchangably:

Foo xInt yInt
Bar xInt yInt

Essentially, named types are just aliases for the structure of the type. This is not new or groundbreaking, but it's the first time I implemented a compiler with a structural type system. One aspect made this a lot more challenging than I initially thought: Recursion.

Consider this type:

LinkedList t =
  empty
    more: (itemrest: (LinkedList t))

It's an enum with two variants – empty and more. The more variant contains a struct with the item and the rest, which is another linked list.

You can create linked list instances like this:

more:
    item1
      rest:
        more:
            item2
              restempty

This LinkedList Int contains the items 1 and 2.

Types as Graphs

So, how does the compiler internally represent types? Because names shouldn't affect type checking in any way, so you could model types like this:

enum Type {
  int,
  struct_Map[StringType],
  enum_Map[StringType],
}

Note: This is a simplification. Plum has more primitives (bytes), special types (never) and composite types (arrays, boxes, lambdas).

Here, recursive types would result in recursive data structures in the compiler:

A graph modeling a linked list using an enum pointing to a struct pointing to the original enum.

Note: In Plum, enum variants use an empty struct as the default payload type.

However, there are some serious downsides to this approach: Representing types as potentially recursive data structures in the compiler require a tremendous level of care when working with them. Simple actions such as debug-printing or hashing types can lead to infite traversals and hanging the compiler if you're not careful. To prevent that, you have to compare type identities (the type objects' addresses), which feels clunky.

Types as Trees

To work around these problems, I decided to model types as trees rather than recursive graphs. I added a recursive variant in the enum that tells us how many levels up in the type tree to continue:

enum Type {
  int,
  struct_Map[StringType],
  enum_Map[StringType],
  recursiveInt,
}

Using this, the LinkedList Int type is represented like this in the compiler:

A tree modeling a linked list using an enum pointing to a struct modeling to a recursive marker.

The recursive type tells us that we should start two layers further up in the tree – at the original enum.

Formatting, hashing, and generally traversing types now no longer requires us to be careful about running into infinite recursions. Simple enough right? Well. Take a look at this function, which determines the length of a linked list:

length list: (LinkedList t) -> Int =
  list
  empty -> 0
    morenode ->
      length (node.rest) .+ 1

The % switches on the LinkedList t enum. In the more case, we unpack the enum variant's payload, making it available as the node variable. What is the type of a single node?

If we would naively extract the payload's type from the internal Map[StringType], that would leave us with this type:

A type that is not self-contained: A struct that points to a recursive marker, which tells us to look two levels up in the type tree.

Oh no! This type is no longer self-contained: The recursive type references a type two levels up in the type tree, but that part of the tree has been discarded when figuring out the payload type!

What we want to happen instead is for the type to "wrap around" when navigating into it:

A struct pointing to an enum pointing to a recursive marker.

I had a version of the compiler that worked like this. However, it was quite finnicky to use:

If you forget one of these steps, invalid types might sneak into parts of the compiler.

Types as Strings

At some point, I wondered if things were easier if I represented types as strings. Turns out, it works surprisingly well. Currently, the compiler represents the linked list type as this string:

(empty: (&more: (item: (trest: (^2)))

Note that the string has a very strict structure to it – every type has parentheses around it and even the (&) type for the empty variant is explicit.

You might think working with strings would be really difficult. However, many algorithms actually got easier.

Type Algorithms

Let's take a brief look at Plum's type algorithms!

First, multiple type strings may represent the same type. Consider this code that appends 42 to the front of a LinkedList Int:

a_list = ...
new_node item42 resta_list
new_list morenew_node

The new_node struct contains an Int and a LinkedList Int. Naively determining the type string gives us this:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))

However, here's a more minimal version of the same type:

(item: (Intrest: (empty: (&more: (^2)))

These two type strings describe the same type: A value that fits the structure of one also fits the other one.

Equivalency Checking

To check whether two type strings represent the same type, we look for incompatibilities between these types – such as different kinds of types (like structs and enums) or structs with different field names. Plum performs a breadth-first search for incompatibilities, using pairs of cursors into the two type strings as the nodes to search. It starts at the beginning of the strings:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))
^

(item: (Intrest: (empty: (&more: (^2)))
^

Here, both cursors point to a struct. The struct fields have the same names. Next, the search compares the individual field types. First, the item field:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))
         ^

(item: (Intrest: (empty: (&more: (^2)))
         ^

Both are of the primitive type Int, so they are compatible. Next, the rest:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))
                     ^

(item: (Intrest: (empty: (&more: (^2)))
                     ^

Here, both cursors point to an enum and the enum variants have the same names. So, we compare the types of the individual variants. The empty variant is an empty struct in both cases, but the more variant is interesting:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))
                                         ^

(item: (Intrest: (empty: (&more: (^2)))
                                         ^

One of the types is a recursive type! So, the cursor moves to the type that the recursive type refers to:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))
                                         ^

(item: (Intrest: (empty: (&more: (^2)))
^

Now, both cursors point to structs again. These have the same field names, so we compare their children, etc.

I won't bore you with the rest of the process, but because the search doesn't revisit pairs of cursors that we already visited and there are only a limited number of combinations of cursors positions, the search terminates. If it didn't find any incompatibilities, both type strings refer to the same conceptual type.

Canonicalization

Now that we can check two type strings for equivalency, we can use that to find a minimal representation for types – a canonical string. We do that by "bubbling up" recursive types and then checking types for equivalency. Take this type:

(item: (Intrest: (empty: (&more: (item: (Intrest: (^2))))

Bubbling up the (^2) replaces its surrounding type:

(item: (Intrest: (empty: (&more: (^2)))

This is equivalent to the original type, so it's a valid simplification! Let's try that again:

(item: (Intrest: (^2))

Oh. This type is no longer self-contained, so we discard it. The previous type is minimal:

(item: (Intrest: (empty: (&more: (^2)))

This process sounds simple, but how would you implement this in code? The compiler contains a function bubble_up_recursive_type(typeTypeiInt): Type that bubbles up the i th recursive type. When types were modeled as trees, this function was one of the most complex pieces of code in the compiler:

Imagine my surprise that using strings instead of trees makes this transformation way simpler:

Rather than a complicated recursive function tracking several pieces of state, we have a straightforward function with three 5-line loops. Amazing!

My Takeaway

Refactoring types to strings was a great decision:

Overall, this was the most unintuitive refactoring I've ever done. If someone tells you they removed type-safe trees in favor of strings, you are usually right to question their approach. In this case however, the fact that strings turned out to be a better abstraction really blew my mind.