Updated to Python 3 in Jan 2020
One easy way to encode Sudoku with binary variables is the presence (1) or absence (0) of a digit in a given square. I.e. we produce a 9x9x9 matrix of binary variables, and then produce an expression of all valid states in in Conjunctive Normal Form (CNF).
We’re using pycosat which encodes CNF as a 2-nested list of positive or negative variable (not) numbers.
For example, to encode that the cell at position (0, 0)
must contain at least one digit from 1 to 9 we’d write:
[
[1, 2, 3, 4, 5, 6, 7, 8, 9], # read: 1 or 2 or 3 or ... or 9
]
The next cell has its own set of variables representing the digits 1 to 9, and so on, all the way to 9^3 == 729.
[
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18], # read (1 or 2 or ... ) and (10 or 11 or 12 or ...)
...
[721, 722, 723, 724, 725, 726, 727, 728, 729],
]
We already encoded that a call must contain at least one digit. Now we encode that a cell must contain at most one digit. We do this by enumerating all 2-combinations of digits, i.e. (1, 2), (1, 3), (1, 4), …, (8, 9) and then saying that one can be set, but not both at the same time. This is expressed as (not A or not B) == not (A and B)
:
[(-1, -2), (-1, -3), (-1, -4), (-1, -5), (-1, -6), (-1, -7),
(-1, -8), (-1, -9), (-2, -3), (-2, -4), (-2, -5), (-2, -6),
...
(-6, -7), (-6, -8), (-6, -9), (-7, -8), (-7, -9), (-8, -9),
]
The rest is just variations of the same theme:
- Each 9x9 block must contain 9 different digits.
- Each row and each column must contain 9 different digits.
Solving
According to the Telegraph newspaper the worlds hardest Sudoku puzzle is this:
8 . . . . . . . .
. . 3 6 . . . . .
. 7 . . 9 . 2 . .
. 5 . . . 7 . . .
. . . . 4 5 7 . .
. . . 1 . . . 3 .
. . 1 . . . . 6 8
. . 8 5 . . . 1 .
. 9 . . . . 4 . .
which we can solve quite easily:
$ time python sudoku.py
8 1 2 7 5 3 6 4 9
9 4 3 6 8 2 1 7 5
6 7 5 4 9 1 2 8 3
1 5 4 2 3 7 8 9 6
3 6 9 8 4 5 7 2 1
2 8 7 1 6 9 5 3 4
5 2 1 9 7 4 3 6 8
4 3 8 5 2 6 9 1 7
7 9 6 3 1 8 4 5 2
real 0m0.136s
user 0m0.117s
sys 0m0.019s
I suspect the majority of time is spent allocating memory for the highly inefficient list/object representation that pycosat uses (list of list of Python ints).
Another interesting exploration from here is to exclude a solution, and then run the solver again. E.g. for the solution shown above we can encode that cell at index (0, 0)
must not be 8, etc. That way we can generate all
possible solutions until we hit UNSAT.
If you start with an empty board you’ll enumerate all valid Sudoku boards, totalling 6.67e21 configurations. That is probably a bit too much for my laptop.