Martinaise: A Simple, Imperative Language

by Marcel Garus · 2024-01-11 · 7 minute read · Martinaise · programming language design · code · available at mgar.us/martinaise

You might know that I already created some programming languages (Mehl, Candy, Dungbeetle). So, why yet another language?

I've written both low-level and high-level languages, but all of them are dynamically typed. Well – our initial version of Candy was typed, but we switched to a completely new, more general feature (needs). Having such a grand vision is exciting, but it bothered me that I didn't get to implement some of the compiler techniques I researched, such as a type solver.

To scratch that itch, I created Martinaise, a new low-level, statically-typed, imperative language with function overloading. Martinaise is a recreational hobby project by me, for me. It doesn't aim to change the world. Its only goal is to be useful to solve simple problems such as from Advent of Code.

As a consequence, I intentionally left out useful features that are boring to implement: The compiler only reports the first error it finds. You can only have one file. There are no modules or namespaces. There's no formatter.

Here's a small example of Martinaise code:

| This is a comment.
| Here are some animals.
struct Cat { age: U64, name: Str }
struct Dog { name: Str }

| Obviously, animals with longer names are bigger.
fun size(cat: Cat): U64 { cat.name.len() }
fun size(dog: Dog): U64 { dog.name.len() }

enum Box[T] { alive: T, dead: T, empty }

fun size[T](box: Box[T]): U64 {
  switch box
  case alive(animal) animal.size()
  case dead(animal) animal.size()
  case empty 0
}

fun main() {
  var fluffle = Cat { name = "Fluffle", age = 2 }
  var size = fluffle.size() | equivalent to size(fluffle)

  var box = Box.alive(fluffle)
  println(
    if box is empty
    then "The box is empty!"
    else "The box is {box.size()} big."
  )
}

The most interesting part for me was function overloading in combination with monomorphization. Similar to how templates in C++ work, the size[T](Box[T]) function is not directly type-checked. Only when it's used with concrete types such as Cat or Dog is it compiled for those types. This means, the compiler never has to resolve animal.size() where animal is any T – it compiles it two times and finds the matching size(Cat) and size(Dog) functions. This way, you can quickly write generic code without defining interfaces. In the final executable, there's code for a size[Cat](Box[Cat]) and a size[Dog](Box[Dog]).

Organic Language Evolution

Martinaise didn't start out with a coherent design – I just created it spontaneously and gradually morphed it into what it is today. Some of the resulting features surprised me. For example, here's the series of (in my opinion reasonable) decisions leading to the then keyword:

Act 1

I don't require parentheses around if conditions. This is similar to how Rust's if works.

if condition { foo } else { bar }

Act 2

Use curly braces for grouping. Because Martinaise doesn't have semicolons, using round parentheses for grouping expressions sometimes creates problems. Take this code for example:

var a = foo
(&bar).do_stuff()

The parser parses this as a call, equivalent to this:

var a = foo(&bar).do_stuff()

So, I was faced with two options:

I went with option 2. This might be surprising, but I already planned to allow using curly braces anywhere an expression is expected for starting a new scope, returning the last expression inside. Parentheses are simply another, more limited way to achieve the same effect.

var a = {
  bar()
  baz()
}

var a = foo
{&bar}.do_stuff()

Act 3

Given that you can now start scopes using curly braces, it makes sense to not require curly braces for ifs. Something like this seems reasonable:

if is_great return "Hi"

Sadly, this becomes unreadable for more complicated conditions, especially ones that go over multiple lines.

if x.is_less_than(0).or(x.is_at_least(grid.width()))
  .or(y.is_less_than(0)).or(y.is_at_least(grid.height()))
  panic("outside of bounds")

So, I decided to introduce a then keyword. It is highlighted as a keyword and clearly separates the condition from the then case. An added bonus: Because the keyword is the same length as else, you get nice parallel structures in your code.

if x.is_less_than(0).or(x.is_at_least(grid.width()))
  .or(y.is_less_than(0)).or(y.is_at_least(grid.height()))
then panic("outside of bounds")
fun digit_to_char(digit: U8): Char {
  if digit.is_greater_than(9)
  then #a.add(digit.subtract(10))
  else #0.add(digit)
}

Early Syntax Desugaring

Because the Martinaise compiler doesn't aspire to be of production-quality, it's a lot simpler. For example, many syntax constructs don't exist in the abstract syntax tree. During parsing, they immediately get desugared into more complex nodes: ifs get compiled into switches, for loops become normal loops, string interpolation creates a Vec and calls write on it with all the parts.

This makes it pretty effortless to add new constructs, allowing me to iterate quickly. For example, I decided to add an orelse keyword which you can call on optional values to provide an alternative if they are empty. Here's how you can use it:

var maybe_number: Maybe[U64] = ...
var a = maybe_number orelse 4

The orelse gets parsed into this:

var a =
  switch maybe_number.to_orelse()
  case primary(a) a
  case secondary 4

If you're wondering about the to_orelse: The standard library contains the following struct as well as to_orelse functions for Bool, Maybe, and Result. By creating a to_orelse function, you can also use the orelse keyword with custom types.

struct Orelse[P, S] { primary: P, secondary: S }

Low-Level Primitives

When possible, I try to define types in the standard library instead of special-casing them in the compiler. For example, here's the definition of Bool:

enum Bool { true, false }
var true = Bool.true
var false = Bool.false

Martinaise is an unsafe language – you can get the memory addresses of data directly. In most of the compiler pipeline, reference types such as &Cat are treated just like structs. In fact, the parser implicitly adds the following struct defintion:

struct &[T] { *: T }

That's right – &Cat is just special-cased formatting for &[Cat]. When you do cat_ref.*, the compiler type-checks a normal field access.

Another cool detail: Casting between values is implemented in Martinaise itself.

fun cast[A, B](a: A): B {
  | at least a small sanity check
  assert(
    size_of_type[A]().equals(size_of_type[B]()),
    "cast between types of different sizes",
  )
  a.&.to_address().to_reference[B]().*
}

Here, to_address[T](ref: &T): U64 and to_reference[T](address: U64): &T are builtin-functions that convert between references and addresses.

Impressions

Writing the compiler for a language with only low-level builtins is fun. You are forced to build all abstractions such as typed memory allocations, Vecs, and Maps in the language itself. It made me understand and appreciate these abstractions better.

Currently, I'm in the process of writing a Martinaise compiler in Martinaise itself. The goal: A 4000 line file that can compile itself. There's beauty in that.