# A regular crossword

A friend shared a link to an unusual puzzle (click for a full-page PDF):

Each row has a regular expression indicating the letters to fill the row. Each cell is at the intersection of three rows, so there are a number of constraints to satisfy at each point.

Overlapping constraints are a good basis for logic puzzles. Sudoku, Ken-Ken, Nonograms, and plenty of other puzzle forms follow the same recipe: determine the contents of a cell, based on multiple simultaneous constraints.

But the regexes here add an extra dimension. Each of the regexes here has a different form, resulting in different levels of information. Two rows have .*, or no information at all. Another has [CR]*, so we know the entire row is C’s and R’s. Each row has a different regex, so the interaction is varied across the grid.

I wrote to the author, Dan Gulotta, about how he constructed it, and he told me,

I constructed the letter grid by building it up a few letters at a time. I started out with a blank grid and all of the regular expressions set to ‘.*’. At each step, I would find a place where I wanted to add a few letters to the grid and then see if I could replace some regular expressions with more constrained ones in order to force those letters to be there. In this way, I was able to ensure that the solution was unique.

A few times during my solving of the puzzle, I used the classic piece of puzzle meta-information as part of the deduction: there is a unique solution. A friend of mine said he solved it without using that fact, but I don’t see a reason to avoid it.

By the way, I didn’t realize this when I solved it, but there’s another level to the puzzle, which is to identify the phrase in it. It was part of the 2013 MIT Mystery Hunt.

» 16 reactions

There is provably more than one unique solution to the puzzle:
First row, second to last slot, has the following constraints:
Across: could be anything, including one of the 2 mandatory H's (tho there could be more than 2 on that row).
Down: A, M or C.
Up: due to a partial solve putting an M somewhere else on that line, not an M (the regex dictates only one M on that line). If this were not the case, M would also be a possibility.
Thus, that cell could be at least either of A or C, but there are no outstanding constraints that would decide between the two, thus there are at least 2 solutions.
...then again, now I look at it again, the down regex indicates that if it were C then there would not be a CM sequence elsewhere down that line, this a constraint that would dictate which of A or C it was, given more of the solution... Poopy.
"Another has [CR]*, so we know the entire row is C's and R's."

That's incorrect, [CR]* means 'matches zero or more instances of either C or R'. Any kind of string could fit this one.
@erez, perhaps I (or the author) should have made clear: each regex matches the entire row. The only way [CR]* can match the entire row is for the row to be composed entirely of C's and R's.
I am treating all the regexes as if I'd done a
`:s/(.*)/^(\$1)\$/`
on them.
200% perfect idea! BTW an automatic solver of the puzzle is a nice task for CS stundents, as well as a generator of new puzzles of the sort.
The interesting thing in terms of creating a solver is programatically identifying the sub-constraints from each RE - it's the set of those that is solvable. E.g. things like if that cell is an X then the two following on this line are both C's, or the set of characters this cell could be is just C or R. That could be the guts of the task tho - then let prolog do the rest (c:

It's a lot easier to write a checker than a solver, and not feasible to brute force even a single line normally, even given the only implicitly stated constriction of each cell containing only an uppercase letter.
I have a checker at https://gist.github.com/agriffis/4746429

An entirely brute force solver is easy to write, but would take a very long time to run. I think a true solver can't lean on an regex engine to execute the given patterns against a search space, because the overlapping constraints can't execute against the same space. Rather you have to explode them to establish individual cell constraints then chain discovered information to the crossed patterns.
One thing that could help is the PCRE partial match support but that only really helps for prefixes - doesn't do much to help where the interesting constraints are at the end of the line.
Michael Brazier 7:57 PM on 12 Feb 2013
Having just finished solving the puzzle by hand, I can say that not only isn't it necessary to assume a unique solution, there is no point at which you need to guess a character for a cell. At every stage of solving, there is at least one cell which has only one possible character remaining.
I'm happily surprised to find my earlier work, regex-genex ( http://hackage.haskell.org/package/regex-genex ), has been adapted by LeventErkok++ into a 100-LOC solver for the crossword:

https://gist.github.com/LeventErkok/4942496

It uses the Z3 theorem prover to find the unique solution.
Amazing work, Audrey. In fact, Mr. Erkok's Haskell solution is significantly less than 100 LOC, seeing as 60 lines of the source code are taken to represent the puzzle data itself.
I solved it. I agree with michael. It was a very well written puzzle and now I'm totally hungry for more!
I solved it too. Took me between 2 and 3 hours of cumulated time (I spent it by short bursts at start), and the start is noticeably slower than the finish (due to the fact that you have much more regexps to explore).

Also, I thought at first that there would probalbly be several solutions, but solving it sequentially made those "alternate" solutions disappear by themselves, by showing new constraints (due to letters having been placed already, for instance).

All in all, a very nice puzzle, and I'm looking forward to more of that kind, if you ever lay your hands on those :)