Meta Strings

by Marcel Garus · 2022-02-22 · 2 minute read · programming language design · available at mgar.us/meta-strings

Most programming languages have the concept of a string, a representation of text in the program. In some languages, you can even integrate code expressions in strings:

"Hello, {name}!\n"

Special characters like { or \ indicate that the following characters are not part of the string literal itself, but should instead be interpreted in a different way – they are escape sequences.

This approach comes with some problems: Now, to actually use { or \ in the string, you have to escape those as well. This is okay, but makes strings with lots of special characters ugly. For example, to the regex that matches words ending with \ can be written like \w\\+, but inside a string it becomes even more bloated:

"\\w\\\\+"

Every level of abstraction doubles the number of escape characters: Regexes have escape characters, and strings escape those as well. If we wanted to write a program that outputs a regex in a JSON object, it would be even more ridiculous: "\\\\w\\\\\\\\+"

Some development environments like IntelliJ actually offer a meta-view of a string, where you can edit the raw string and you simultaneously edit the escaped version in your program code.

Rust offers an alternative to these escape sequences: Raw strings! If you open a string not with ", but with r#" instead, you can only end the string using "#. Even better: How many hash signs (also known as "octothorpes") you put at the beginning, you have to use the same number at the end of the string to close it. Here's this behavior expressed in a state machine:

state machine for Rust raw strings

This means you can copy-paste any content into your program and use it directly in a string without escaping individual characters. Instead, you just slap a bunch of hash signs at the beginning and end, and you're ready to use those bytes you pasted as a string:

let my_program = r##"
fn main() {
    let name = r#"world"#;
    println!("\"Hello, {}!\", said the program.", name);
}"##;

This example also highlights that you can use raw strings inside of other raw strings as long as the outer one is more meta than the inner one.

We think about implementing something similar in Candy: If you start you string with ''", it can only be ended with "''. You can increase the number of single quotes as needed.

Unlike Rust's raw strings, we also want to also allow string interpolation, as long as the interpolated code is wrapped in the same number of { and } as there are leading characters:

state machine for Candy strings

"A normal string with {interpolation}, but no double quotes."
'"A meta string can have double quotes. Here's a single one: ""'
'""Hello, {{name}}!" is what I said to {{name}}."'
''"Here's what an empty string looks like: "" This is meta meta!"''

Thanks for reading!

If you liked this article, feel free to share it using this shortlink:

By the way, I wrote other articles about programming language design. Here's an article I recommend:

Mehl: A Syntax Experiment

2022-05-21 · 5 minute read · programming language design

Roughly speaking, there are two ways to describe data transformations:

  • top-down: you first start with a high-level overview of the dataflow

  • bottom-up: you describe what exactly you do with data and build up abstractions as you go along

Most programming languages enable both styles of representing data transformations. On a small scale, those styles usually happen in the form of function calls or method calls, respectively. For example, here's a prototypical program that sums a list and then calculates the sinus of the result:

sin(list.sum)

Some function calls are written in a top-down f(x) fashion, others in a bottom-up x.f style. A few languages, such as Nim, even support a Uniform Function Call Syntax, so that you can use both styles equivalently. Other languages, such as Lisp, enforce one style over the other:

(sin (sum list))

Interestingly, almost no language enforces a bottom-up style. A notable exception is shell scripting, where it's common to use the pipe operator | to pipe data from one program into the next:

ls | grep foo

This resembles how I intuitively think about source code with lots of data manipulation. For me, the description "sum the list, then take the sinus of that" feels less complicated than "take the sinus of the sum of the list." Especially for longer function chains, the bottom-up approach allows you to mentally simulate the data flowing through the program as you read the code, while the top-down approach results in a mental stack overflow.

So, what would a programming language look like that enforces a bottom-up style?