Solving Kakuros

Writing a fast Kakuro solver in Rust

by Marcel Garus · 2023-8-14
available at www.marcelgarus.dev/kakuro

Have you ever wondered what would happen if Crosswords and Sudokus had a baby? I know I haven't, but on a vacation in Lisbon, a Sudoku book in our flat introduced me to the concept of Kakuros:

Photo of a page of Kakuros of a Sudoku book on a couch in a flat in Lisbon

Like Sudokus, you have to fill all cells with the digits 1 through 9 and the same digit can't appear twice in the same word. Like crosswords, the layout is all over the place and you have clues. In this puzzle, however, those clues are not a description of a word, but the sum of the digits.

To get an intuition about how to approach a Kakuro, here's a tiny one:

2 4 13

The clue 2 indicates that the sum of the digits below has to be 2. Because there's only one cell, we can put a 2 in there.

2 4 13 2

There are three ways of adding two digits to 4: You can add 1+3, 2+2, and 3+1. Because the same digit can't appear twice in the same word, that leaves 1 and 3 as well as 3 and 1 for the column. The 1 can't be at the top though: If it was, then the cell on the right would have to contain a 10 to reach the horizontal sum of 13, but only the digits 1 through 9 are allowed. Thus, the 1 goes to the bottom and the 3 to the top.

Finally, we can calculate that the last digit needs to be an 8. And that's it:

2 4 13 2 3 8 1

Tadaa! You just solved your first Kakuro.

Well… to be honest, I'm not even sure this is a valid Kakuro. All Kakuros I saw have a clue for every row and column. For example, here's the first Kakuro from the book, which took us two days to solve:

Speaking of spending way too much time on something: Getting experience in profiling and optimizing code was on my todo list for some time, so I took this opportunity to write a Kakuro solver in Rust!

Modelling Kakuros in Code

First, I developed a file format to store Kakuros:

# The small Kakuro from above.
  \    2\    4\     \
  \11 _____ _____ _____
  \     \   _____   \

Underscores represent empty cells and backslashes indicate walls with an optional vertical and horizontal sum on either side of the backslash. For example, the 4\ indicates that the sum of the cells below should be 4. Files get parsed into a two-dimensional vector of cells:

struct Board {
    cellsVec<Vec<Cell>>, // Outer is vertical, inner horizontal.
}
enum Cell {
    Empty// Needs to be filled with a digit.
    Wall {
        vertical_sumOption<Value>,
        horizontal_sumOption<Value>,
    },
}
type Value u8;

This data structure is pretty close to how Kakuros are visually shown. While that's great for some use-cases (such as generating Kakuros or converting them into the images in this article), they contain redundant information that gets in the way when trying to solve the Kakuro. For example, horizontal and vertical sums are treated differently and it's difficult to get all cells to which a sum applies.

That's why I abstracted Kakuros into a more generic form: Instead of treating them as a grid, the program handles them as a singular line of cells that are governed by constraints. For the purposes of solving the Kakuro, showing someone the small example from above is equivalent to giving them the following:

2 4 13

The Rust version of this is pretty straightforward:

struct Input {
    num_cellsusize,
    constraintsVec<Constraint>,
}
struct Constraint {
    cellsVec<usize>, // The indices of the cells to which this applies.
    sumValue,
}

Solutions to Kakuros are then represented as a vector with a value for every cell – a Vec<Value>. To check if a given solution is valid, we need to check that all numbers are in the range from 1 through 9 and that all constraints are satisfied:

impl Input {
    pub fn is_solution(&selfsolution: &Vec<Value>) -> bool {
        solution.len() == self.num_cells
            && solution.iter().all(|number| (1..=9).contains(number))
            && self
                .constraints
                .iter()
                .all(|constraintconstraint.is_solution(solution))
    }
}
impl Constraint {
    pub fn is_solution(&selfsolution: &Vec<Value>) -> bool {
        let digits self.cells.iter().map(|isolution[*i]).collect_vec();
        let unique_digits digits.iter().copied().collect::<HashSet<_>>();

        if unique_digits.len() < digits.len() {
            false // A digit appears twice.
        else {
            digits.iter().sum::<Value>() == self.sum
        }
    }
}

Functions for solving a Kakuro then have this signature:

fn solve(input: &Input) -> Vec<Vec<Value>> {
    // Here goes the code for solving the Kakuro.
}

If you're wondering why it returns a Vec<Vec<Value>> instead of just a Vec<Value>: There may be Kakuros without a solution or ambiguous Kakuros with multiple solutions. The solver should handle these cases correctly too.

I'll develop several versions of the solver that are optimized in various ways. All versions will use the same function signature so we can compare their performance. With that in place, let's write the first solver!

A Naive Approach

To get started, we'll create a very simple solver: It just tries all combinations and checks if they are valid solutions. It starts by filling all cells with 1s, and then counts them up, treating them like a single number: For a four-cell Kakuro, it first tries 1111, then 1112, 1113, etc. until it reaches 9999.

pub fn solve(input: &Input) -> Vec<Vec<Value>> {
    let mut attempt vec![1input.num_cells];
    let mut solutions vec![];

    'searchloop {
        if input.is_solution(&attempt) {
            solutions.push(attempt.clone());
        }

        // Increase attempt by one, interpreted as a single number.
        let mut attempt.len() - 1;
        loop {
            attempt[i] += 1;
            if attempt[i] == 10 {
                attempt[i] = 1;
                if == {
                    break 'search;
                }
                -= 1;
            } else {
                break;
            }
        }
    }

    solutions
}

Nice! These hardly thirty lines of code suffice for solving small Kakuros. The four-cell Kakuro I introduced to you above got solved quickly. Sadly, for bigger Kakuros, the runtime increases exponentially with the number of cells – each additional cell means that nine times more combinations have to be constructed and validated.

For instance, even after running on my computer for an hour, the solver hasn't solved this Kakuro from the Kakuro Wikipedia page, which is titled "An easy Kakuro puzzle":

We have our first goal: Solve the Wikipedia Kakuro!

Filling the Cells One by One

If you reflect on how we humans solve Kakuros, it's obvious that we don't fill out all cells and then check if everything works out. Instead, we gradually fill the cells one by one, and at each step, we are carefully paying attention so that the Kakuro remains valid. Let's do something similar in code!

Now, our candidate is no longer a Vec<Value>. Instead, each value is optional, resembling a cell that's either blank or filled with a digit: Vec<Option<Value>> This way, we can represent partially filled-out Kakuros.

Because we changed how a Kakuro is represented, we also have to adjust how we check a candidate's validity. Before, we checked if all constraints are met. Now, we only do that for constraints where each cell is filled out.

While we're at it, let's do another optimization: Because we know that we only fill in numbers between 1 and 9, we can remove the check for that.

impl Input {
    fn is_possible_solution(&selfattempt: &[Option<Value>]) -> bool {
        self.constraints.iter().all(|constraintconstraint.is_possible_solution(attempt))
    }
}
impl Constraint {
    fn is_possible_solution(&selfattempt: &[Option<Value>]) -> bool {
        let cells self.cells.iter().map(|iattempt[*i]).collect_vec();
        let digits cells.into_iter().filter_map(|itit).collect_vec();
        let unique_digits digits.iter().copied().collect::<HashSet<_>>();

        if unique_digits.len() < digits.len() {
            false // A digit appears twice.
        else if digits.len() < self.cells.len() {
            true // Ignore partially filled-out constraints.
        else {
            digits.iter().sum::<Value>() == self.sum
        }
    }
}

We'll stick with the brute force approach of trying out lots of combinations, but we'll fill the cells one by one and abort as soon as the Kakuro becomes invalid. We can use recursion to do that:

pub fn solve(input: &Input) -> Output {
    let mut attemptVec<Option<Value>> = vec![Noneinput.num_cells];
    let mut solutions vec![];
    solve_rec(input, &mut attempt, &mut solutions);
    solutions
}
fn solve_rec(input: &Inputattempt: &mut Vec<Option<Value>>, solutions: &mut Vec<Solution>) {
    if !input.is_possible_solution(attempt) {
        return;
    }

    let first_empty_cell_index attempt.iter().position(|itit.is_none());
    if let Some(index) = first_empty_cell_index {
        for in 1..={
            attempt[index] = Some(i);
            solve_rec(inputattemptsolutions);
        }
        attempt[index] = None;
    } else {
        // There is no remaining empty cell. This is a solution.
        solutions.push(attempt.iter().map(|cellcell.unwrap()).collect());
    }
}

We could just as easily have given each recursive step its copy of a game instead of making it fill the first empty cell with digits and then clean up after itself in the end, but that would require us to copy lots and lots of data. In the current implementation, all recursive calls act on the same memory region, rapidly modifying it and only copying it if a new solution was found.

Compared to the naive solver, this is fast! It even solves the Wikipedia Kakuro in about 5 seconds. To get a more comprehensive comparison between the solvers, I also attempted to solve some Kakuros from kakuros.com, their sizes ranging from 15&times;15 to 30&times;30. It has no chance against most of them though, and the Kakuro from the book also remains unsolved. Here are the results:

Most of my programming experience is with high-level languages and high-level applications, where making your intent clear is the utmost concern and where performance is usually not a problem. If you're coming from a systems programming background, you probably already cringed when I introduced uniqueness checking before:

let unique_digits digits.iter().copied().collect::<HashSet<_>>();

if unique_digits.len() < digits.len() {
    return false// A digit appears twice.
}

...

let sumValue digits.iter().sum();
let unused_digitsHashSet<Value> = HashSet::from_iter(1..=9)
    .difference(&unique_digits)
    .map(|digit| *digit)
    .collect();
...

While this is nice to read and easy to understand (if you're familiar with sets), it's also a performance sink.

Complexity-wise, HashSets are a good choice, but they pay for that with a huge constant factor: Every item is hashed into an 8-byte number and then assigned to a bucket in an array that needs to be maintained. When adding an item, several edge cases need to be caught, like hash collisions or growing the bucket array. This might make sense if you store many items, but for nine digits at most, a HashSet is a huge beast you're unleashing on your code. It's like using a library indexing system to store your shopping list – sure, it works, but unless you need to buy hundreds of ingredients for a complicated recipe, writing it down by hand on a single sheet of paper is faster. Especially if you only have nine items.

So let's get rid of the HashSets (yes, plural) and replace them with custom code that changes an array of booleans depending on what digits exist:

let mut seen = [false9];
for digit in &digits {
    if seen[(digit 1as usize] {
        return false// A digit appears twice.
    else {
        seen[(digit 1as usize] = true;
    }
}

let sumValue digits.iter().sum();
let unused_digits = (1..=9u8).filter(|digit| !seen[(digit 1as usize]).collect_vec();
...

And fair enough, replacing hash sets with this makes the solver five times faster:

four-cell wikipedia 15×15 20×20 30×30 book
naive 793.01 μs > 1 h > 1 h > 1 h > 1 h > 1 h
gradual 863.55 μs 831.44 ms 101.93 s > 1 h > 1 h > 1 h
sum reachable 43.12 μs 31.95 ms 113.88 ms 1.24 s 72.74 s 108.57 s
prioritize 97.86 μs 324.32 ms 390.11 ms 310.38 s > 1 h 481.85 s
no set 8.29 μs 5.41 ms 21.25 ms 216.69 ms 14.83 s 21.49 s

Only Check Changes

Retrospectively, this is an obvious one. Whenever we fill in a value, we don't need to validate the entire Kakuro – only the constraints that contain the cell.

In the beginning, we can save what constraints are contain each cell. Then, when filling out a cell, we only need to check the affected constraints:

pub fn solve(input: &Input) -> Output {
    let mut attempt vec![Noneinput.num_cells];
    let mut solutions vec![];
    let mut affected_constraints vec![vec![]; input.num_cells];
    for (iconstraintin input.constraints.iter().enumerate() {
        for cell in &constraint.cells {
            affected_constraints[*cell].push(i);
        }
    }
    solve_rec(input, &affected_constraints, &mut attempt, &mut solutions);
    solutions
}
fn solve_rec(
    input: &Input,
    affected_constraints: &[Vec<usize>],
    attempt: &mut Game,
    solutions: &mut Vec<Solution>,
) {
    let first_empty_cell_index attempt.iter().position(|itit.is_none());
    if let Some(index) = first_empty_cell_index {
        'candidatesfor in 1..={
            attempt[index] = Some(i);
            for constraint_index in &affected_constraints[index] {
                let constraint = &input.constraints[*constraint_index];
                if !constraint.is_satisfied_by(attempt) {
                    continue 'candidates;
                }
            }
            solve_rec(inputaffected_constraintsattemptsolutions);
        }
        attempt[index] = None;
    } else {
        solutions.push(attempt.iter().map(|cellcell.unwrap()).collect());
    }
}

Nice! This yields an additional 14X speedup:

four-cell wikipedia 15×15 20×20 30×30 book
naive 793.01 μs > 1 h > 1 h > 1 h > 1 h > 1 h
gradual 863.55 μs 831.44 ms 101.93 s > 1 h > 1 h > 1 h
sum reachable 43.12 μs 31.95 ms 113.88 ms 1.24 s 72.74 s 108.57 s
prioritize 97.86 μs 324.32 ms 390.11 ms 310.38 s > 1 h 481.85 s
no set 8.29 μs 5.41 ms 21.25 ms 216.69 ms 14.83 s 21.49 s
only changes 4.84 μs 740.98 μs 2.05 ms 9.68 ms 569.47 ms 1.48 s

Track the Index of the First Empty Cell

In each recursion step, we search for the first empty cell:

let first_empty_cell_index attempt.iter().position(|itit.is_none());

Because we fill out all the cells in their order anyways, we can pass the index of the first empty cell to the recursion function directly, no calculation needed:

fn solve_rec(
    input: &Input,
    affected_constraints: &[Vec<usize>],
    first_emptyusize,
    attempt: &mut Vec<Option<Value>>,
    solutions: &mut Vec<Solution>,
) {
    if first_empty attempt.len() {
        'candidatesfor in 1..={
            attempt[first_empty] = Some(i);
            for constraint_index in &affected_constraints[first_empty] {
                let constraint = &input.constraints[*constraint_index];
                if !constraint.is_satisfied_by(attempt) {
                    continue 'candidates;
                }
            }
            solve_rec(inputaffected_constraintsfirst_empty 1attemptsolutions);
        }
        attempt[first_empty] = None;
    } else {
        solutions.push(attempt.iter().map(|cellcell.unwrap()).collect());
    }
}

This has an insignificant effect on the runtime though:

four-cell wikipedia 15×15 20×20 30×30 book
naive 793.01 μs > 1 h > 1 h > 1 h > 1 h > 1 h
gradual 863.55 μs 831.44 ms 101.93 s > 1 h > 1 h > 1 h
sum reachable 43.12 μs 31.95 ms 113.88 ms 1.24 s 72.74 s 108.57 s
prioritize 97.86 μs 324.32 ms 390.11 ms 310.38 s > 1 h 481.85 s
no set 8.29 μs 5.41 ms 21.25 ms 216.69 ms 14.83 s 21.49 s
only changes 4.84 μs 740.98 μs 2.05 ms 9.68 ms 569.47 ms 1.48 s
pass index 5.12 μs 734.19 μs 2.06 ms 9.87 ms 554.91 ms 1.46 s

The Evil Heap

Let's look at the Flamegraph again. In the optimized version of the program, lots of events are classified in the [unknown] bucket, so we have no stack trace for them. What we can see is that the runtime is dominated by the allocation and freeing of memory:

This is not surprising. Memory on the heap is managed in a complex way – the memory allocator has to look for free areas of memory to give out and has to free and join regions of memory when they are freed. Intrinsically, there are lots of ceremonies associated with using heap memory.

Stack memory on the other hand is cheap to use. There's no allocator – you can just use memory in your function's stack frame without telling anyone. It's also tightly packed together, making it very cache-friendly.

So, where do we use heap memory? And is there a way for us to reduce our usage of heap memory?

The following code seems to be the culprit:

let is_sum_reachable unused_digits
    .into_iter()
    .combinations(self.cells.len() - digits.len())
    .map(|additional_digitssum additional_digits.into_iter().sum::<Value>())
    .any(|possible_sumpossible_sum == self.sum);

According to the documentation of combination from the itertools crate, this function allocates new Vecs for each combination:

Return an iterator adaptor that iterates over the k-length combinations of the elements from an iterator. Iterator element type is Vec<Self::Item>. The iterator produces a new Vec per iteration, and clones the iterator elements.

The solution? We can instead try to hand-roll an implementation that checks if the sum of the constraint is reachable. This is also an opportunity to re-use partial sums instead of re-adding all items for each combination of values. A quick disclaimer: The code for this is somewhat ugly, using macros to make it slightly more bearable:

macro_rulescheck_additional {
    ($nesting:expr, $digits_so_far:expr, $target_digits:expr, $sum_so_far:expr, $target_sum:expr,
    $seen:expr, $block:block) => {
        if $sum_so_far <= $target_sum {
            if $digits_so_far == $target_digits {
                if $sum_so_far == $target_sum {
                    return true;
                }
            } else {
                for in ($nesting 1)..={
                    if $seen[1] {
                        continue;
                    }
                    $digits_so_far += 1;
                    $sum_so_far += a;
                    $seen[1] = true;
                    $block;
                    $digits_so_far -= 1;
                    $sum_so_far -= a;
                    $seen[1] = false;
                }
            }
        }
    };
}

impl Constraint {
    fn is_satisfied_by(&selfattempt: &Vec<Option<Value>>) -> bool {
        let mut seen = [false9];
        let mut sum 0usize;

        for digit in self.cells.iter().filter_map(|battempt[*b]) {
            if seen[(digit 1as usize] {
                return false// A digit appears twice.
            }
            seen[(digit 1as usize] = true;
            sum += digit as usize;
        }

        let mut num_digits seen.iter().filter(|it| *it).count();
        let target_digits  self.cells.len();
        let target_sum self.sum as usize;

        check_additional!(0num_digitstarget_digitssumtarget_sumseen, {
          check_additional!(1num_digitstarget_digitssumtarget_sumseen, {
            check_additional!(2num_digitstarget_digitssumtarget_sumseen, {
              check_additional!(3num_digitstarget_digitssumtarget_sumseen, {
                check_additional!(4num_digitstarget_digitssumtarget_sumseen, {
                  check_additional!(5num_digitstarget_digitssumtarget_sumseen, {
                    check_additional!(6num_digitstarget_digitssumtarget_sumseen, {
                      check_additional!(7num_digitstarget_digitssumtarget_sumseen, {
                        // Constraints are only checked if at least one cell is
                        // filled out. 8 cells later, everything is filled out.
                      });
                    });
                  });
                });
              });
            });
          });
        });
        false
    }
}

The uglification of code is worth it though. We get some solid performance improvements:

four-cell wikipedia 15×15 20×20 30×30 book
naive 793.01 μs > 1 h > 1 h > 1 h > 1 h > 1 h
gradual 863.55 μs 831.44 ms 101.93 s > 1 h > 1 h > 1 h
sum reachable 43.12 μs 31.95 ms 113.88 ms 1.24 s 72.74 s 108.57 s
prioritize 97.86 μs 324.32 ms 390.11 ms 310.38 s > 1 h 481.85 s
no set 8.29 μs 5.41 ms 21.25 ms 216.69 ms 14.83 s 21.49 s
only changes 4.84 μs 740.98 μs 2.05 ms 9.68 ms 569.47 ms 1.48 s
pass index 5.12 μs 734.19 μs 2.06 ms 9.87 ms 554.91 ms 1.46 s
no alloc 743.40 ns 84.23 μs 263.48 μs 1.12 ms 76.78 ms 343.88 ms

Conclusion

We can solve Kakuros automatically. The Kakuro from the book got solved in well under one second. I could try to optimize the solver even further. To be honest, though, I feel like I scratched this itch sufficiently. Spending some time on a small project with clear metrics is fun, but life goes on.

You're welcome to mess with my code, which is on GitHub. If you come up with more optimizations or better strategies, feel free to contact me.

Edit: There are some more solution attempts in the repo, but their performance is somewhat unreliable. The fastest one solves the Kakuro from the book in 89&nbsp;ms, but runs out of memory on the 30&times;30 Kakuro.