Augmented reality Sudoku using MATLAB

First thing's first - I am not a software engineer and never had any formal programming training. I do like creating new things and I find computer vision fascinating. I believe that it is simpler than it might initially appear.


All of this started as an attempt to create a normal Sudoku solving function, but after I did that, I thought it would be cooler to combine Sudoku solving with computer vision. 


To make it a bit easier, I divided this operation into nine parts:

  1.       Grabbing the image.
  2.        Transferring the original image to B/W.
  3.        Finding the shape with the biggest bounding box. this is our (deformed) sudoku board.
  4.        Finding the corner coordinates of that shape.
  5.        Using the corners, warp that shape to a perfect square.
  6.        Find the digit in each cell (if there’s indeed a digit in it).
  7.        Solve the sudoku.
  8.        Use the existing digits’ images to superimpose the solution (only the missing digits) on the perfect square.
  9.        De-warp the square back to the shape it was and superimpose it on the image.

 

1.    Grabbing the image.

Either the image input or, if you’re handling a video, a frame from it.




2.    Transferring the original image to B/W.

I first turn it into grayscale, then turn it into B/W, and lastly, complement the image (turn black to white and white to black) to make it readable by the function ‘regionprops’ (next step).




3.    Finding the shape with the biggest bounding box. this is our (deformed) sudoku board.

Outsourcing this to a function called ‘regionprops’, which finds all of the shapes in the B/W image, their area, bounding boxes, and the pixel coordinated for all of their pixels. We go over all of the shapes and pick the one that has the biggest bounding box.

 

Here’s how the shape looks, superimposed in red on the image:


4.     Finding the corner coordinates of that shape.

I work under the assumptions that:

a)       The sudoku square is not very deformed (e.g., it was not shot with a fisheye lens)

b)      The sudoku square is not tilted to over ~40 degrees to either side.

With these assumptions we can say that the corners can be found by summing the x and y coordinates of each pixel that's part of the shape we found, like so:

Top right corner: Maximum sum.
Bottom left corner: Minimum sum.

We then invert the Y axis (making the Y coordinate equal the total height-Y. With slight changes we could’ve done the same thing to the X axis, it doesn’t really matter) and we check again, now we’ll find the following:

Bottom right corner: Maximum sum.
Top left corner: Minimum sum.

 

Here’s how it would look like with the corners in place:



5.    Using the corners, warp that shape to a perfect square.

Using the corner coordinates we found and the functions ‘tform’, ‘imref2d’, and ‘imwarp’, we warp the shape of the sudoku board to a perfect square.

I’ve found a great explanation on how to do it here: https://stackoverflow.com/questions/32447767/how-to-warp-an-image-into-a-trapezoidal-shape-in-matlab

 

Here’s how the B/W perfect square looks like after the warp:



 

6.       Find the digit in each cell (if there’s indeed a digit in it).

Now that we have a perfect square, and we assume that the Sudoku is a normal 9X9 board, we use the "ocr" function in each cell and check if it contains a digit and if so, which one.

This allows us to create an array with all of the digits for us to solve in the next step (if we can’t see a digit, we put a ‘0’ in the array).

We also use that opportunity to take an image of each digit from one to nine, so we’ll be able to add it later to the solution. This assumes that the unsolved Sudoku contains all of the digits, which is not always the case. We can also have default digit images for us to use, in case digits are missing from the unsolved Sudoku.

 

We can see that the array that was created matches the image:


  

7.     Solve the sudoku.

There are many ways and tutorials on how this can be done, here’s a high-level overview of an easy solution:

a)       Go over the board to see if a cell has only one digit option, if I find one, I add it and continue, after I went over all of the cells, I repeat the process, finding new single option cells using the cells I was able to solve until now.

b)      If I went over all of the currently empty cells and didn’t find any new cells in which I can determine the digit (single option cells), I go to the first empty cell and randomly choose one of its remaining options, then go back to option a. This can result in one of three outcomes:

1)      The Sudoku is fully solved – Great!

2)      If I went over all of the empty cells and didn’t find any new cells in which we can determine the digit (scenario b) – I go one level deeper in the recursion (go to another empty cell and randomly choose one of its remaining options).

3)      The sudoku becomes unsolvable (e.g., I get to a point where the same row must contain the same digit twice) – I go one level out of the recursion, removing the random digit I chose as the option for that cell.

This will eventually get you to a solution if there is one (usually very fast).

 


8.       Use the existing digits’ images to superimpose the solution (only the missing digits) on the perfect square.

 

We now create a square image that has all of the digits we added to fully solve the Sudoku (not including the digits we had to begin with, as they are already part of the image):



9.      De-warp the square back to the shape it was and superimpose it on the image.

Now we de-warp that perfect square and superimpose it on the original image.

We started with:



And finished with:



Once we can do this to one image, we can do it to a video as well:

1)     Repeat steps 1-9 for the first frame.

2)     Repeat steps 1-4 and 9 for all following frames. Steps 5-8 aren’t needed since you already know the solution for this Sudoku, you just need to know how to superimpose your solution on the new frames.

 

a)     Since it’s a video, you can also assume that the frame you’re working on is somewhat similar to the frame before it, which can allow you to look for the Sudoku in a smaller area and save you time.

 

Here’s an example for an input (right) and an output (left) video:




b)    Trying to superimpose a video can also bring edge scenarios you’ll need to deal with, such as a Sudoku board that becomes too small or too big to detect. In that case, a more elaborate function can be created, where we decide that the Sudoku board is too small and we stop the superimposing, then we detect the board again on a later frame and then calculate and superimpose the solution again.

 

For example (this is not perfect, if you look closely, you can see that it missed a couple of digits when we zoomed back in on the Sudoku board):



 

c)      This code is not perfect and there’s a lot that can be improved in it, for example:

1)      Going beyond a 45-degree tilt:




2)      Not including the corners of the Sudoku board in the frame:





While it's not perfect, there's a lot of other cool things we can do with it by changing it slightly, for example - it's fast enough to superimpose a solved Sudoku in real-time, using a webcam (or a phone that emulates a webcam).