Mehl: A Syntax Experiment

by Marcel Garus · 2022-05-21 · 5 minute read · programming language design · available at mgar.us/mehl

Roughly speaking, there are two ways to describe data transformations:

  • top-down: you first start with a high-level overview of the dataflow

  • bottom-up: you describe what exactly you do with data and build up abstractions as you go along

Most programming languages enable both styles of representing data transformations. On a small scale, those styles usually happen in the form of function calls or method calls, respectively. For example, here's a prototypical program that sums a list and then calculates the sinus of the result:

sin(list.sum)

Some function calls are written in a top-down f(x) fashion, others in a bottom-up x.f style. A few languages, such as Nim, even support a Uniform Function Call Syntax, so that you can use both styles equivalently. Other languages, such as Lisp, enforce one style over the other:

(sin (sum list))

Interestingly, almost no language enforces a bottom-up style. A notable exception is shell scripting, where it's common to use the pipe operator | to pipe data from one program into the next:

ls | grep foo

This resembles how I intuitively think about source code with lots of data manipulation. For me, the description "sum the list, then take the sinus of that" feels less complicated than "take the sinus of the sum of the list." Especially for longer function chains, the bottom-up approach allows you to mentally simulate the data flowing through the program as you read the code, while the top-down approach results in a mental stack overflow.

So, what would a programming language look like that enforces a bottom-up style?

To find that out, a few months ago, I decided to create a tiny programming language called Mehl. Here's what the code from above looks like in Mehl:

list sum sin

Mehl only has some built-in types of values:

42              # integer
"Hello!"        # string
:foo            # symbol
(2, 3)          # tuple
{:a, :b}        # map
["Hey!" print]  # code

To be honest, maybe I went a little too far with the bottom-up syntax. For example, to declare a variable, you first create the value and then assign it to a name using =>:

4 => foo

Functions

Functions in most programming languages can take multiple inputs but have to produce exactly one output (and even if the function doesn't produce anything, it has a unit output like void). In Mehl, every function consumes exactly one input and produces exactly one output. Functions that don't need an argument just ignore what's given, and functions that have nothing to produce can instead create the empty symbol :.

Function with one input and one output represented as a box.

Defining a function is similar to defining a variable – you just use -> instead of =>:

["Hi" print] -> greet
greet

As a consequence of the one-argument policy, you don't need to specify arguments to code blocks. Instead, functions can directly start working with the input:

[print] -> myPrint
"Hi" myPrint

You can also access the output of the previous expression using a dot (.):

[(., .) *] -> square
3 square print

Running square (or any other function) on some value is equivalent to inserting its source code at that place. Here's how the execution of 3 square print proceeds:

3 square print
3 (., .) * print
(3, 3) * print
9 print
#> prints 9
:

I do admit that this simplicity also comes with downsides. In particular, passing multiple arguments to a function is quite cumbersome. You effectively have to make those functions take a tuple containing the arguments:

(1, 2, 3)
  (., [square])
  map

Whitespace

One aspect I was positively surprised by is that you don't need to worry about indentation or semicolons. Mehl doesn't even need to cleverly try to distinguish statements – it just executes all code in sequence. Take this code:

"Hello, world!" print
(2, 3) * print

For the execution, we don't care about whitespace:

"Hello, world!" print (2, 3) * print
#> prints "Hello, world"
: (2, 3) * print
(2, 3) * print
6 print
#> prints 6
:

Primitive literals like (2, 3) just "overwrite" the existing value. You can treat them just like functions that ignore the input and produce a new value.

My Takeaway

Assembling everything from bottom-up building blocks sounds like a good idea at first, but there are definitely cases where it is not the most intuitive approach. For example, reading a long function without knowing its name until the end is challenging – most of the time, you don't even know the goal of the code that you're reading.

Another example is very declarative programming, where it makes sense to first describe the rough structure and only then specifics. For example, a UI programming framework in Mehl would have to look something like this, inverting the nested parts of the UI:

{
  :body, (
    ("Hello, world!", :bold) text center (., 8) padding
    ("Count", ["Pressed" print]) button
  ) column,
} app

My main lesson from this experiment is that some things are better described in a top-down fashion.

Still, most day-to-day data manipulations are straightforward to read. While Mehl is very bare-bones and I only implemented a few math operations, printing, and reading from user input, there is also potential for code that can be read naturally:

300 milli seconds wait
[read eval print] loop

Thanks for reading!

If you liked this article, feel free to share it using this shortlink:

By the way, I wrote other articles about programming language design. Here's an article I recommend:

A Critical Look at Static Typing

2022-01-23 · 6 minute read · programming language design

Nowadays, there are soooo many programming languages. One critical differentiation point among them is how their type systems work. You can sort programming languages into two camps:

  • Statically-typed languages determine what sort of values can be stored in variables just by looking at the program.

  • Dynamically-typed languages can't statically infer what sort of values can be stored in variables. They have to run the program to find out.

If given a choice, I prefer statically-typed languages. Because they know the types of variables, they can offer clever suggestions. Additionally, the compiler catches many errors while you write the code, so you get immediate feedback.

On the other hand, most type systems limit what you can express in a language. In comparison, dynamic languages feel more flexible.