Compact Structs
Memory layouting is fucking hard
Many programming languages have product types, often called structs. For example, this is a struct in my language Martinaise:
struct Foo { a : Byte , b : Int , c : Byte }
In Martinaise, s take up 8 bytes (64 bits) and s take up 1 byte (surprising, I know). So, for an instance of , these are the pieces of data that we somehow need to store in memory:
The most straightforward way is to store the fields next to each other:
However, on modern computers, memory accesses should be aligned: When moving values between registers and memory, the absolute memory address should be a multiple of the value's size. While x86 CPUs are forgiving (unaligned memory access is allowed, but may be slower than aligned one), ARM CPUs are stricter and some instructions cause "alignment faults" if you load data from an unaligned address. So, in order to be able to efficiently load an 8-byte from memory into a CPU register to do calculations, it should be stored at an address that is a multiple of 8.
When I first learned about alignment, it felt weird, quirky, and unintuitive. The absolute address matters? Whyy? But apparently, this makes things easier for the hardware and it's here to stay. Compilers usually deal with that by making types have a size and an alignment, where an alignment of n means that the value can only be placed at addresses that are a multiple of n. To achieve correct alignment of struct fields, compilers introduce padding (unused space). Here's the memory layout that a C compiler would use for our :
If we place such a at an address that is a multiple of 8, the field will also be at an address that is a multiple of 8, so we can load the directly into a register. Note that the C compiler doesn't just add padding after the field, but also after , so that if you have an array of s, all of them are aligned to 8 bytes.
Other languages enable more compact layouts. For example, the Rust language gives no guarantees about the order of fields in memory (unless you use a special annotation, ). This allows the Rust compiler to reorder fields, and that's exactly what it does:
Compact layouts are generally more efficient because they use less memory, lead to fewer cache misses, etc. However, like C, Rust requires the size of a type to be a multiple of its alignment, documented in .
In Martinaise, I decided to go a different route: The size of a value doesn't have to be a multiple of its alignment. The struct has a size of just 10 bytes and an alignment of 8:
Slices
What happens when we want to put multiple s next to each other in memory? In C and Rust, that automatically works. In Martinaise, the type from the standard library has to be careful to ensure proper alignment: For each item, it reserves space according to the "stride size", which is just the size rounded up to the alignment.
Here's the relevant code from the standard library:
struct Slice [ T ] { data : Address , len : Int }
| Used internally by get, set, get_ref, etc.
fun get_ref_unchecked [ T ]( slice : Slice [ T ], index : Int ): & T {
{ slice . data + { index * stride_size_of [ T ]()}}. to_reference [ T ]()
}
fun stride_size_of [ T ](): Int {
size_of [ T ](). round_up_to_multiple_of ( alignment_of [ T ]())
}
| Magically implemented by the compiler.
fun size_of [ T ](): Int { ... }
fun alignment_of [ T ](): Int { ... }
Struct Layouting
If a type's size is always a multiple of its alignment, layouting a struct is pretty easy: By sorting the fields according to decreasing alignment, there is no padding between them. This is optimal, but only because we view fields as opaque memory blobs, when they might actually contain padding at the end.
If your sizes are independent of the alignment, things are more complicated and it's difficult to find an optimal strategy. For every sorting criteria I've come up with, there's a combination of fields that cause the layout to contain unnecessary padding:
Sorting fields by decreasing alignment is not optimal:
Sorting fields by increasing alignment is not optimal:
Sorting fields by decreasing size is not optimal:
Sorting fields by increasing size is not optimal:
Even though I haven't proven it, I suspect this problem is NP-complete – it feels a bit similar to bin-packing. Perhaps it helps that the alignments are always powers of two?
I took the hacky route: First, I noticed that lots of structs only contain s and pointers and other structs that only contain s and pointers – those cause the maximum alignment to be 8 and because their size is a multiple of 8, I can safely move them to front of the struct. For the (often few) remaining fields, I just bruteforce 1000 permutations and choose the best one.
In another language of mine, Plum, I place fields from regularly-shaped ones to irregularly-shaped ones – first, fields where the size is a multiple of 8, then fields where the size is a multiple of 4, then 2, then 1. I also don't append a field at the end of the struct if it fits in a padding hole before. This doesn't always find an optimal placement, for example here:
But the struct layouts do feel very packed together in practice.
Compact Memory Layouts
Compared to the C/Rust approach, my more nuanced take on memory layouts adds some complexity to the language and the compiler. However, this complexity is localized to the implementation of slices and struct layouting.
The upside is that all the structs that aren't in arrays/slices (my gut feeling says this is the majority) get more efficient layouts. To get a feel for that, here is an editable code area, with the memory layouts shown live below:
Big Picture
There's other stuff you can do to make your programs use less memory. Admittedly, some of them probably have a bigger impact than being really smart about your struct layouts.
Niches tell the compiler that some bit patterns never occur. Depending on the language, the compiler can ensure that null pointers or certain NaN patterns in floating point numbers never occur and make those bit patterns mean something else. This is how an in Rust can be 8 bytes.
Global type arenas combined with column-wise storage can make languages really efficient. For example, if you have a type in your program, the compiler could create two global lists containing all x and y positions, respectively. A value would actually just be an index into those lists. Tight loops that operate only on some fields of objects then have really dense memory access patterns, which is good for memory caches. If you can make assumptions about the maximum number of objects of a type, you don't even need the full 8 bytes for an index, but instead use something smaller, which makes all other data structures more efficient, etc. Modern game engines use this kind of data-driven design a lot.
Anyways. Was it worth it? At least for Martinaise and Plum, definitely. While I didn't cover it in this article, both languages represent enums as the payload followed by a tag, so many types have odd shapes – for example, a takes up 9 bytes. Also, following this rabbit hole was fun. I can sleep well knowing that my languages sometimes choose better struct layouts than the Rust compiler.