Dr. Dobb's | Sudoku & Graph Theory

Sudoku & Graph Theory

Understanding graph theory is central to building your own Sudoku solver.

February 01, 2006
URL:http://www.drdobbs.com/cpp/sudoku-graph-theory/184406436

Sudoku is a logic puzzle in which there are 81 cells (vertices) filled with numbers between 1 and 9. In each row, the numbers 1,2,3,..,9 must appear without repetition. Likewise, the numbers 1,2,3,..,9 must appear without repetition in the columns. In addition to the row and column constraints, the numbers 1,2,3,..,9 must appear in the nine nonoverlapping 3×3 subsquares without repetition. So in short, the puzzle board is separated into nine blocks, with nine cells in each block (see Figure 1).

Figure 1: Sudoku puzzle.

There are several possible rules you can use to successfully fill in missing numbers. In this article, we examine two rules — Chain Exclusion and Pile Exclusion — for solving Sudoku puzzles. These rules are at the heart of a Windows-based Sudoku solver that we built using Visual C++. (Executables and the complete source code for this solver are available here). The goal of this logical Sudoku solver is to prove that only one possible number can be assigned to each vertex, and to find that number for each vertex in which the number is not defined. Illogical Sudoku puzzles can also be solved, but require guesses (Implementation, OK button).

We refer to possible numbers that should be assigned to a row, column, or one of the nine 3×3 subsquares as a "Permutation Bipartite Graph" or nodes. A node consists of a vector of n>1,n=2,3,4... vertices and all possible numbers that can be assigned to these vertices, such that there exists at least one possible match between the vertices of the vector and the numbers 1,2,...n.

For example, the following are nodes:

({1,2,3,5},{2,3},{2,3,4},{3,4},{4,5}, n=5
({1,2,3,7},{3,6},{3,4},{1,4},{5,6,7},{4,6},{2,7},{8,9},{8,9}, n=9

A possible match for the first vector is easy:

1 -> {1,2,3,5}
2 -> {2,3}
3 -> {2,3,4}
4 -> {3,4}
5 -> {4,5}

A possible match for the second vector is more tricky:

2 -> {1,2,3,7}
3 -> {3,6}
4 -> {3,4}
1 -> {1,4}
5 -> {5,6,7}
6 -> {4,6}
7 -> {2,7}
8 -> {8,9}
9 -> {8,9}

A number can be only assigned to a vertex that contains the possibility of assigning that number. For instance, only the following possibilities are accepted:

7 -> {2,7} or 2 -> {2,7}.

Pile Exclusion and Chain Exclusion provide the basis of logical elimination rules.

To understand Pile Exclusion, consider the following nodes:

({1,2,3,5},{3,6},{3,4},{5,6},{1,7,8,9},{4,6},{5,7,8,9},{4,6},{6,7,8,9},{1,4}, n=9

The numbers 7,8,9 appear only in three vertices:

{1,7,8,9},{5,7,8,9},{6,7,8,9}

Because there is at least one possible match in the Permutation Bipartite Graph, one vertex will be matched to 7, one to 8, and one to 9. Thus, you can erase the other numbers from these three vertices to get the following three augmented vertices:

{1,7,8,9} -> {7,8,9} 
{5,7,8,9} -> {7,8,9} 
{6,7,8,9} -> {7,8,9}

and the entire Permutation Bipartite Graph becomes:

({1,2,3,5},{3,6},{3,4},{5,6},{7,8,9},{7,8,9},{4,6},{7,8,9},{1,4}), n=9

As for Chain Exclusion, consider these nodes:

({1,2,3,7},{3,6},{3,4},{1,4},{5,6,7},{4,6},{2,7},{8,9},{8,9}, n=9

In the second, third, and sixth positions in the vertices vector, you have:

{3,6},{3,4},{4,6}

Only the numbers 3,4,6 can be assigned to these vertices. From this, you infer that 3,4,6 are not a matching option in any of the remaining vertices. Thus, you can erase these numbers from all the other vertices, resulting in a new, more simple graph:

({1,2,7},{3,6},{3,4},{1},{5,7},{4,6},{2,7},{8,9},{8,9}, n=9

You can do the same thing with {1}, so that the resulting graph is:

({2,7},{3,6},{3,4},{1},{5,7},{4,6},{2,7},{8,9},{8,9}, n=9

Algorithms

We now present an algorithm that finds all such chains in polynomial time. The first stage is to find the best bipartite matching using the Ford Fulkerson maximum flow algorithm or any other bipartite matching algorithm (see Introduction to Algorithms, Second Edition, by Thomas H. Cormen et al., MIT Press, 2001).

Let W be the set of n cells in a row, column, or subsquare.

That means that there are n cells or vertices in W when the entire Sudoku puzzle has n×n cells or vertices. Let D be the set of numbers that are matched to W. In an n×n Sudoku, D is the set of numbers 1,2,3,...,n.

Let S_k be a subset of k cells out of the n cells in which the possible numbers are exactly k<n numbers T_k. We say that S_kis a "chain" if it doesn't contain a smaller set Q_r such that r<k to which only r numbers can be matched. Moreover, in our model, if the number v in D can be matched to the cell u in W, then we say that our Permutation Bipartite Graph G contains the edge ∈ =(u,v) between the vertex u from W and v from D. In addition, let M denote a maximal match between W and D. A match is a set of edges connecting cells or vertices in W to numbers or vertices in D. Each cell in W is allowed to have only one edge in M that connects it to a number in D.

Each number in D is allowed to have only one edge in M that connects it to a cell or vertex in W. Let E denote the entire set of possible vertices between W and D.

Because the vertices of S_k are only connected to vertices of T_k, the condition |M|=|W| implies that each element s∈S_k is matched to exactly one element t ∈ T_k and vice versa. Start scanning a vertex u₁ ∈ W. Suppose u₁∈S_k. We check all the vertices v_i1 in D such that the edges (u₁,v_i1) exist in E. Because S_k is connected only to T_k,v_i1 are all in T_k. We mark u₁ such that we will not consider u₁ again. We remove u₁ from the list of vertices the algorithm will visit. Now we look for all the edges (u_i1,v_i1) in M that are connected to one of these vertices v_i1. Obviously, u_i1 are in S_k for all indices i₁ because all the vertices in T_k are matched by M to vertices in S_k. We continue the process recursively. Now, for each u_i1 we look for all v_i2 in D that are connected to at least one of the u_i1 vertices by edges. We mark u_i1such that these vertices will not be considered again. Again, the v_i2 must be in T_k and we continue and look for all the edges (u_i2,v_i1)∈M. Obviously, u_i2 are in S_k.

Because u₁ and the vertices u_i1were removed from the list of vertices that the algorithm visits, there are fewer vertices that can be visited by the algorithm. The process is repeated until all the vertices in W it can visit are reached.

Because all the vertices it can reach are in S_k, what is left to prove is that there is no vertex u in S_k that is not visited by the algorithm. Suppose there is such a vertex u. Then, obviously, the vertex v in T_kthat is matched to u could not be visited either; otherwise, u could be visited. But then, v is also not connected to any one of the vertices in S_k that were visited. So we can define Q_k-1=S_k-{u}, which is connected to |S_k|-1 vertices in D. That is a contradiction to the requirement that S_k is minimal.

A simpler algorithm is the Pile algorithm. Let G be a Permutation Bipartite Graph. We would like to find if there exist vertices v_i1,...,v_ik in D such that they are all connected to the same k vertices in W, u_j1,...,u_jk. Such vertices are called "Pile" or "Set." The algorithm is trivial. Start traversing the vertices v_k in D serially in a loop. Activate a second loop within the first loop and count all the other vertices in D that are connected to the same k vertices u_j1,...,u_jk in W that v_k is connected to. If there are k such vertices v_i1,...,v_ik in D, then you are done. Although this algorithm can be improved, it is efficient and its runtime is |W|²/2.

After finding a chain, the algorithm can erase all the edges that are connected to the vertices in T_kthat are not connected to S_k. This operation is called "Chain Exclusion." That is, if ∈=(u,v) and v∈T_k and u∈S_k, then ∈ is removed from the Permutation Bipartite Graph G.

After finding a Pile, remove all the edges that connect u_j1,...,u_jk to D-{v_i1,...,v_ik}.

Implementation

We've used the Chain Exclusion and Pile Exclusion algorithms described here to build a Windows-based Sudoku solver in Visual Studio C++ 6.00. Executables and the complete source code are available electronically here. You will need to rename the file Sudoku.ex1 to Sudoku.exe. Alternatively, you can compile the entire project Sudoku.dsw or Sudoku.dsp with Visual Studio 6.00. We've included a simple console application, Logic.ex1, that demonstrates Chain Exclusion. This file should be renamed to Logic.exe. In Sudoku.exe, the Logic button calls the class Square3.cpp, and Square3.cpp calls Bipartite.cpp. The Logic button fills as many cells as possible with numbers, by using logic only. You can generate puzzles that usually have more than one solution by clicking on the Test button when the Unique option doesn't have a check mark. If the Unique option is checked, the Sudoku puzzle is solved by logic only. The Very Hard option automatically checks the Unique option and takes 5 seconds. You can also enter your own puzzle and press either the Logic button or the OK button. If the puzzle can't be solved because of contradictions, OK won't fill it and Logic will fill as many cells as possible. When the contradiction is encountered, it inserts a double number in row, column, or square. That can be verified by the Check button.

In addition, Sudoku.exe also solves illogical puzzles via the OK button that calls recursion that in each iteration fills in numbers in locations in which there are less degrees of freedom. For more information on Sudoku puzzles, see the Daily Sudoku) or the Times Online).

Eytan, Raviv, and Eitan are software engineers in Israel. They can be contacted at [email protected], [email protected], and [email protected], respectively.