Letterpress Word Finder

2013-08-27

In an attempt to start to blog more, here's a quick follow-up post on the previous Letterpress article.

Background

As a reminder, here's how I outlined steps in creating a Letterpress solver:

Take screenshot of game and import it into solver
Parse the board into a string of letters
Reduce a dictionary of valid words against those characters to find playable words
Optionally make recommendations of which word to play based on current board state and strategy. (i.e. don't be naive)

We built step one (sort-of) and step two in the previous article, so let's move on to step three.

Requirements

We want our script to fulfill the following requirements:

Accept the board letters via STDIN or commandline arguments.
Reduce the dictionary words against those letters.
Dump out matching words (without regard to board state/strategy).

Implementation

We'll take either an argument or read STDIN and downcase it.

letters = (ARGV[0] || STDIN.read).downcase

I don't have the official Letterpress dictionary (a quick googling will get you on the right track if you insist), but every good unix-y system has a dictionary file.

$ cat /usr/share/dict/words | wc -l
235886

OK, that's a lot of words. Let's pull them in and downcase them too.

words = File.read("/usr/share/dict/words").downcase.split("\n")

Now, the only really interesting part: a method to determine if a word can be constructed from letters. I've shamelessly borrowed a perfectly fast solution from Stackoverflow.

def is_subset?(word, letters)
  !word.chars.find{|char| word.count(char) > letters.count(char)}
end

And now we reduce our words by those that match our letters

matching_words = words.select do |word|
  is_subset?(word, letters)
end

And there's nothing left to do but dump them out.

puts matching_words.sort_by(&:length)

Here's the entire word generating script.

And an example of using it with the board parser from the previous post:

$ ruby -r ./board_parser -e "puts BoardParser.new('light.png').tiles.join" | ruby letter.rb | tail -n 10
hermodactyl
typhlectomy
cryohydrate
polydactyle
pterodactyl
crymotherapy
hydrolyzable
acetylthymol
overthwartly
protractedly

Excellent. Of course, not all words in your system's dictionary file may be playable, YMMV, etc.

Quick and Dirty OCR for Letterpress & Other Tile-based Games

2012-11-18

I've been playing enough Letterpress lately to realize that I'm not great at it. This is super frustrating for me when this is a game that you could easily teach a computer to play.

I'm not the first person to have that thought. There are plenty of cheating programs for Letterpress (just google or search in the app store).

I haven't investigated these solvers but in thinking about the problem, the basic approach would seem to be:

Take screenshot of game and import it into solver
Parse the board into a string of letters
Reduce a dictionary of valid words against those characters to find playable words
Optionally make recommendations of which word to play based on current board state and strategy.

I wondered how quickly I could throw something together to simply parse the game board into a string of letters. It turns out it is super easy. To get started I took a screenshot of a game in progress and downloaded it from my phone.

I'd heard about tesseract back when it was first announced and it seemed worth giving it a shot. I started with brew install tesseract and tried simply passing in the board image unmodified:

$ tesseract light.png /tmp/output
$ cat /tmp/output.txt
R
QM V
66:
KO

Not even close. The homebrew instructions recommend grayscaling the image first with ImageMagick, so what do we get after that?

$ convert light.png -type Grayscale /tmp/gray.tif
$ tesseract /tmp/gray.tif /tmp/output
$ cat /tmp/output.txt

QM
V
w
A‘ K
6'

Ugh, even worse.¹

But poking through tesseract's options reveal some promise via pagesegmode settings:

7 = Treat the image as a single text line.

and

10 = Treat the image as a single character.

7 turned out to be a bust, but after testing option 10 on a few individual tiles, things were starting to look up. So let's break the image up into the individual 25 tiles and recognize each one.

There may be more elegant ways to break the image into tiles, but I ended up using two ImageMagick commands:

# remove the non-tile content (i.e. the scores, etc. in the header)
convert light.png -gravity North -chop 0x320 /tmp/headless.png
# break the tile-content into 128x128px chunks
convert /tmp/headless.png -crop 128x128 /tmp/tile_%02d.png

Then I wrapped these two commands up in a ruby class for ease of use and wrote a simple test.

so, running the ruby class:

$ ruby -r ./board_parser -e "puts BoardParser.new('light.png').tiles.join"
KTVHROBDRBDLCYTPLEWAFZYMB

Bingo! A perfect match, ready to be compared to a dictionary of valid words.

It turns out tesseract is great at matching single tiles regardless of color scheme or captured state of the tile. The tests for the code run against screenshots from all available color schemes.

This approach is dead-simple and leans heavily on solid existing technologies. Because of this, the glue code itself doesn't have to be clever at all :)

Note that this quick hack is only designed to work against iPhone 4 resolution screenshots -you would have to (at least) change the header crop size for iPhone 5.

1: The recommendation to convert the image to grayscale is a good hint that tesseract probably relies a lot on consistent contrast. This coupled with the fact that we're dealing with random letters instead of words definitely stacks the cards against tesseract. Dealing with individual tiles/characters solves both the confusion of multiple contrasts and the confusion of dealing with random gibberish.

semantic art

code should say something

Letterpress Word Finder

Background

Requirements

Implementation

Quick and Dirty OCR for Letterpress & Other Tile-based Games