Let’s Write a Basic Vim Plugin

Over the holidays, I finally got around to reading Steve Losh’s wonderful Learn Vimscript the Hard Way. I liked it so much that I bought a copy after reading it for free online.

In an effort to put what I’ve learned into practice, I’m going to walk through creating a simple plugin to evaluate Postgres queries and see the results immediately in Vim. I hope you’ll follow along, and please feel free to substitute your database of choice.

Here’s our plugin in action:

Background

Postgres is great. The CLI, psql, thankfully has \e to let you edit your queries in $EDITOR. You’re probably fine just using \e (just remember to use the block comment style or your comments will be eaten by a grue).

But if I’m editing my queries in Vim anyway, why not also evaluate my queries from there? Sure, Vim is not an operating system (see :help design-not), but when we can see the results of our queries alongside our sql, we can create a tighter feedback loop. This integration opens up more possibilities for later (e.g. adding a mapping to describe the table under the cursor). It also gives us all of Vim for browsing the results (e.g. find, copy/paste, etc.).

To get us started, here’s a sample sql query to look at data from The Hall of Stats. It is an intentionally trivial example, but stick with me. Bring your own sql query to get data from one of your local databases if you’re following along.

1
2
3
4
5
6
7
8
-- List first 10 players mentioned in articles
SELECT articles_players.article_id,
  players.first_name,
  players.last_name
FROM articles_players
JOIN players ON players.id = articles_players.player_id
ORDER BY article_id ASC
LIMIT 10;

It is easy enough to run a file with psql and see the results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ psql hos_development -f test.sql
 article_id | first_name | last_name
------------+------------+------------
          1 | Bill       | Bergen
          1 | Babe       | Ruth
          2 | Ernie      | Banks
          2 | Yogi       | Berra
          2 | Bill       | Buckner
          2 | Frank      | Chance
          2 | Dave       | Concepcion
          2 | Andre      | Dawson
          2 | Julio      | Franco
          2 | Bob        | Johnson
(10 rows)

Getting started on the plugin

Our plugin will only have one file in it. In more complex plugins, you’ll want to leverage autoloading, but we’ll keep things simple here and keep all our code in one place.

In our plugin directory we have a ftplugin folder. In that folder, we’ll create a file named sql.vim. This code will be automatically evaluated when a sql file is loaded.

1
2
$ mkdir -p sql_runner.vim/ftplugin
$ touch sql_runner.vim/ftplugin/sql.vim

Before we start coding away, we need to make Vim aware of our plugin. Add the following to your vimrc file (substituting the path to your plugin directory on disk):

1
set runtimepath+=,/some/absolute/path/to/sql_runner.vim

Perfect. Now in sql_runner.vim/ftplugin/sql.vim we set up the default mapping for our plugin.

1
nnoremap <buffer> <localleader>r :call RunSQLFile()<cr>

Note: There are good reasons to not provide default mappings in your plugin or to at least allow users to opt-out of your default mappings, but, again, we’re keeping things simple.

We’re using <localleader> for the mapping (check :help localleader for insight). If you haven’t remapped <localleader> then it is still the default: \ (which makes this keybinding \r)

Let’s start implementing RunSQLFile very naively at first:

1
2
3
function! RunSQLFile()
  execute '!psql hos_development -f ' . expand('%')
endfunction

Save that and open up a sql file. Now press <localleader>r

Sure enough, this shows the output (until you hit enter to continue). Not a bad start, but we can already see a problem. We’ve hard-coded hos_development as the database name :( We’re also not passing in a user or password to my psql command. That’s OK on my machine since my user already has permissions on that database, but it isn’t ideal to edit the plugin itself every time we want to change databases or specify permissions. Let’s go ahead and make this more flexible.

1
2
3
4
function! RunSQLFile()
  let l:cmd = g:sql_runner_cmd . ' -f ' . expand('%')
  execute '!' . l:cmd
endfunction

This allows us to specify the global variable g:sql_runner_cmd in our vimrc (or define/redefine it on the fly). I’m adding let g:sql_runner_cmd = 'psql hos_development' to my vimrc (and :sourceing it).

Because our code is in a file in ftplugin, you should be able to reload the plugin after saving changes by editing your sql file again (:e). Give the command another try. The output should be the same (except that you won’t see the command itself echoed back).

Now that our plugin is a little more flexible, what can we do about displaying the results in a split? Step one is to read the psql output into a variable.

We’ll replace the execute '!' ... call with

1
2
let l:results = system(l:cmd)
echo l:results

We’re still echo-ing the results out like before, but now we have them in-memory before we echo to the screen. Borrowing liberally from chapter 52 of Learn Vimscript the Hard Way, let’s dump our results into a new split.

1
2
3
4
5
6
7
8
9
10
11
function! RunSQLFile()
  let l:cmd = g:sql_runner_cmd . ' -f ' . expand('%')
  let l:results = systemlist(l:cmd)

  " Create a split with a meaningful name
  let l:name = '__SQL_Results__'
  execute 'vsplit ' . l:name

  " Insert the results.
  call append(0, l:results)
endfunction

We changed system to systemlist to simplify our append. This is pretty straightforward: We create a buffer with a name, it gets focus automatically, and we append our results to it.

Re-open the sql file and run our mapping. It works. Now run our mapping again. Oof. It opens another split. That’s a little tricky to fix (unless you’ve already done the extra credit in the Learn Vimscript the Hard Way chapter linked above) so we’ll deal with it in a bit. In the meantime, there are two easier issues to fix:

  1. re-running the command will append to the content from the previous run.
  2. the results buffer is a “normal buffer” so Vim will prompt you to save the results if you try to delete the buffer (bd) or close Vim. That’s not ideal for a throw-away scratch buffer.

We’ll make a few changes and add the following lines above the append code:

1
2
3
4
5
  " Clear out existing content
  normal! gg"_dG

  " Don't prompt to save the buffer
  set buftype=nofile

That’s two problems solved. Now what about the unwanted additional split every time we run our command? The extra credit section in Losh’s chapter gives us the hint to use bufwinnr.

If you provide bufwinnr a buffer name, it returns the number for the first window associated with the buffer or -1 if there’s no match. Close and re-open Vim and we’ll play with bufwinnr.

Before running our command, evaluate echo bufwinnr('__SQL_Results__') and you’ll see -1. Now use the mapping on a sql file and run echo bufwinnr('__SQL_Results__') again and you’ll see 1 (or a greater number if you have more splits open). If you load a different buffer in the result split window or close the results split, you’ll get -1 again. What does this tell us? If we get a value other than -1, we know that our result buffer is already visible and we should re-use it rather than opening a new split. Making a few changes, our function ends up looking like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
function! RunSQLFile()
  let l:cmd = g:sql_runner_cmd . ' -f ' . expand('%')
  let l:results = systemlist(l:cmd)

  " Give our result buffer a meaningful name
  let l:name = '__SQL_Results__'

  if bufwinnr(l:name) == -1
    " Open a new split
    execute 'vsplit ' . l:name
  else
    " Focus the existing window
    execute bufwinnr(l:name) . 'wincmd w'
  endif

  " Clear out existing content
  normal! gg"_dG

  " Don't prompt to save the buffer
  set buftype=nofile

  " Insert the results.
  call append(0, l:results)
endfunction

Reload your sql file and try this a few times. It works!

Not surprisingly, using the function for awhile reveals some room for improvement. As nice as it is to iterate on sql in one Vim split and see the results quickly in the other, it is a little annoying that the results buffer gets focus. I don’t want to have to jump back to the sql buffer from the results window each time. This is solved by adding execute 'wincmd p' (previous window) to the bottom of the function.

Finally, it is a bit of a pain to have to save my work before running the command each time. This is easily fixed by adding silent update to the top of the function. This will write the file if the content has changed.

For completeness, here’s the final version of the plugin.

Thanks for reading. Let me know if you have any requests for future posts on Vim plugins or any feedback on this post.

That’s probably enough for now, but read on if you want a quick detour and some suggested exercises.

A detour: do we need to persist the file?

Should evaluating the sql be tied to writing the file? Or is writing the file as part of evaluation an implementation detail? We could just pipe the contents of the current buffer to psql, but there’s actually a good reason to evaluate the persisted file.

Consider the following SQL:

1
2
3
4
5
6
7
-- some comment
select 1;

-- another comment;
select true,
  false,
  now(; -- note the syntax error here

If you evaluate this file in psql with -f, you get the following:

1
2
3
4
5
6
7
8
9
$ psql hos_development -f syntax_error.sql
 ?column?
----------
        1
(1 row)

psql:syntax_error.sql:7: ERROR:  syntax error at or near ";"
LINE 3:   now(;
              ^

Notice how it shows that the error occurred on line 7 (absolute to the file) and line 3 (relative to the problematic query). When you pipe the content in, you lose the absolute line number context.

1
2
3
4
5
6
7
8
9
$ cat syntax_error.sql | psql hos_development
 ?column?
----------
        1
(1 row)

ERROR:  syntax error at or near ";"
LINE 3:   now(;
              ^

If you evaluate a file, you’ll get absolute line numbers regardless of how many queries deep your syntax error occurs on. That’s more important to me than avoiding some unnecessary writes.

If you really wanted to evaluate the buffer contents without saving, here’s a few tips:

  • As shown above, psql can read from standard input. psql hos_development -f test.sql can be rewritten as cat test.sql | psql hos_development
  • Vim’s systemlist command lets you pass a second argument to be passed along to the command as stdin
  • You can get the content of the current buffer with getline(1, '$')

That should give you enough to go on.

Suggested exercises

Add some code to allow users to skip the default mappings. You’ll probably want to use a global variable like we did for g:sql_runner_cmd

Add a new mapping and function to describe the table under the cursor. I might cover this one in a future post.

vim

I18n Game Content: Multilingual Audiosprites and Captions Workflow

I’ve started making a list in my head of things I feel strongly about in software. I18n is pretty high towards the top of that list. I wrote (and re-wrote) a post explaining why i18n is so darn important, but I couldn’t find a comfortable balance between all-out rant and something that felt hollow. In the meantime, I’m just going to say clearly: “Please internationalize your software.”

Here’s an example of I18n in the wild.

I was working on a game…

I’ve wanted to make a video game since I was a kid sitting at my dad’s Apple IIc thumbing through the Basic manual. I briefly toyed around with various attempts over the years but never really got very serious. Last year I finally devoted some real time into learning both Unity and Phaser. I ended up shelving game-dev for a bit, but it was fun exploring new challenges.

While I was prototyping an adventure game in Phaser, I wanted to build a robust audio and text dialogue system that supported multiple language locales. I ended up finding some neat technologies and creating a comfortably streamlined workflow.

You can check out the resulting audio engine prototype read on for the process.

The requirements

  1. cross-browser compatible audio
  2. captions in multiple languages
  3. audio in multiple languages
  4. easy to create and update
  5. caption display timing synchronized with audio

(Note that the actual mechanics of dialogue bouncing between person A and person B won’t be covered here.)

Cross-browser compatible audio

Different browsers support different audio formats out of the box. If you want cross-browser compatible audio, you really want to serve your content in multiple formats. Don’t fret about bandwidth: clever frameworks (Phaser included) will only download the best format for the current browser.

In Phaser, you just pass an array of audio files.

1
2
3
4
game.load.audio('locked_door', [
    'assets/audio/locked_door.ac3',
    'assets/audio/locked_door.ogg'
]);

You want ogg for Firefox and then probably m4a and/or ac3. You might want to avoid mp3 for licensing reasons, but I’m not a lawyer (I’m also not an astronaut).

Captions in multiple languages

For our purposes, captions are really just text displayed on the screen. In nearly every adventure game, the character will encounter a locked door. Attempting to walk through that door should result in our character explaining why that can’t happen yet.

Even if we didn’t care about internationalization, it would make sense to refer to the caption content by a key rather than hard-coding the full text strings throughout our game. Beyond just keeping our code clean, externalizing the strings will allow us to have all our content in one place for easy editing.

Here’s a very simple caption file in JSON format:

1
2
3
4
5
{
  "found_key": "Oh, look: a key.",
  "locked_door": "Drats! The door is locked.",
  "entered_room": "Finally, we're indoors."
}

We’ll write a function to render the caption so that we only need to pass in the key:

1
2
3
4
5
6
7
8
9
function say(translationKey) {
  // get the text from our captions json
  var textToRender = game.cache.getJSON('speechCaptions')[translationKey];

  // draw our caption
  game.add.text(0, 20, textToRender, captionStyle);
}

say("locked_door");

And it renders something like this:

locked door caption

Localizing our captions is pretty straightforward. For each language we want to support, we copy an existing translation file and replace the JSON values (not the keys) with translated versions.

We’d do well to leverage convention over configuration. Keep all captions for a locale in a folder with the locale name.

1
2
3
/assets/audio/en/captions.json
/assets/audio/de/captions.json
...

Changing locales should change the locale folder being used. Your game is always loading “captions.json” and it just decides which copy to load based on the player’s locale.

Audio in multiple languages

This part doesn’t need to be overly clever. Record the same content in various formats for each language.

Consider the caption JSON from the previous section. It might make sense to have one JSON file per character. With some direction, a voice actor could read each line and you could save the line with a filename matching the key (e.g. the audio “Drats! The door is locked.” is saved as locked_door.wav).

We’ll store the encoded versions in locale-specific folders as we did with our captions.json

1
2
3
4
5
/assets/audio/en/locked_door.ac3
/assets/audio/en/locked_door.ogg
/assets/audio/de/locked_door.ac3
/assets/audio/de/locked_door.ogg
...

And then we can update our say function to also play the corresponding bit of audio.

1
2
3
4
5
6
7
8
9
10
11
12
function say(translationKey) {
  // get the text from our captions json
  var textToRender = game.cache.getJSON('speechCaptions')[translationKey];

  // draw our caption
  game.add.text(0, 20, textToRender, captionStyle);

  // speak our line
  game.speech.play(translationKey);
}

say("locked_door");

Easy to create and update

Have you ever played a game or watched a movie where the captions didn’t accurately reflect what was being said? This drives me crazy.

I’m guessing that the reason that audio and caption text fall out of sync is probably late content changes or the result of actors ad-libbing. Fortunately we’ve got a system that is friendly to rewrites from either side. Prefer the ad-lib? Update the caption file. Change the caption? Re-record the corresponding line.

The content workflow here is straightforward. To reiterate:

  • Create a script as json with keys and text. Edit this until you’re happy. Tweak it as the game content progresses.
  • Translate that file into as many locales as you care about.
  • Losslessly record each line for each locale and save the line under the file name of the key.
  • Tweak captions and re-record as necessary.

That’s all well and good, but now you’ve got a ton of raw audio files you’ll need to encode over and over again. And having a user download hundreds of small audio files is hardly efficient.

We can do better. Enter the Audio Sprite. You may already be familiar with its visual counterpart the sprite sheet, which combines multiple images into a single image. An audio sprite combines multiple bits of audio into one file and has additional data to mark when each clip starts and ends.

Using the audiosprite library, we can store all of our raw audio assets in a per-locale folder and run:

1
2
3
4
5
6
7
8
9
10
11
12
13
  audiosprite raw-audio/en/*.wav -o assets/audio/en/speech
info: File added OK file=/var/folders/yw/9wvsjry92ggb9959g805_yfsvj7lg6/T/audiosprite.16278579225763679, duration=1.6600907029478458
info: Silence gap added duration=1.3399092970521542
info: File added OK file=/var/folders/yw/9wvsjry92ggb9959g805_yfsvj7lg6/T/audiosprite.6657312458846718, duration=1.8187981859410431
info: Silence gap added duration=1.1812018140589569
info: File added OK file=/var/folders/yw/9wvsjry92ggb9959g805_yfsvj7lg6/T/audiosprite.3512551293242723, duration=2.171519274376417
info: Silence gap added duration=1.8284807256235829
info: Exported ogg OK file=assets/audio/en/speech.ogg
info: Exported m4a OK file=assets/audio/en/speech.m4a
info: Exported mp3 OK file=assets/audio/en/speech.mp3
info: Exported ac3 OK file=assets/audio/en/speech.ac3
info: Exported json OK file=assets/audio/en/speech.json
info: All done

Awesome. This generated a single file that joins together all of our content and did so in multiple formats. If we peek in the generated JSON file we see

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "resources": [
    "assets/audio/en/speech.ogg",
    "assets/audio/en/speech.m4a",
    "assets/audio/en/speech.mp3",
    "assets/audio/en/speech.ac3"
  ],
  "spritemap": {
    "entered_room": {
      "start": 0,
      "end": 1.6600907029478458,
      "loop": false
    },
    "found_key": {
      "start": 3,
      "end": 4.818798185941043,
      "loop": false
    },
    "locked_door": {
      "start": 6,
      "end": 8.171519274376417,
      "loop": false
    }
  }
}

Phaser supports audiosprites quite well. We tweak our engine a bit to use sprites instead of individual files and we’re good to go.

Caption display timing synchronized with audio

Now we turn to keeping the captions we’re displaying in sync with the audio being played. We have all the timing data we need in our audiosprite JSON.

We’ll update our say function to clean up the dialog text after the audio has ended:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function say(translationKey) {
  // get the text from our captions json
  var textToRender = game.cache.getJSON('speechCaptions')[translationKey];

  // draw our caption
  var caption = game.add.text(0, 20, textToRender, captionStyle);

  // speak our line
  var audio = game.speech.play(translationKey);

  // set a timeout to remove the caption when the audio finishes
  setTimeout(function(){
    caption.destroy();
  }, audio.durationMS);
}

say("locked_door");

Aside: Not everyone reads at the same speed. You’ll probably want to consider having some sort of slider that acts as a multiplier for the caption duration. Readers who prefer to read more slowly can happily use 1.5X or 2X caption duration. You might not want to have the slider go less than 1X lest the captions disappear while the speech audio is still ongoing, but perhaps some portion of your audience will turn off audio in favor of reading quickly. The duration of the audio still makes sense to me as a starting point for caption duration.

The prototype code

The prototype code covers all you need to get rolling with Phaser and Audiosprites. It also has basic support for preventing people talking over each other. Hopefully you’ll find it instructive or at least interesting.

That concludes this random example of I18n in the wild. Stay global, folks.

2014 Year-In-Review

2014 was a great but complicated year. Here’s a completely unnecessary and self-indulgent look back at some of it.

Read on →

Untwisting a Hypertext Narrative - PEG to the Rescue!

In this post you’ll learn why I think Parsing Expression Grammars are awesome and see an example of how I built one to scratch an itch.

The Itch

After spending some time writing Choose Your Own Adventure-style books in markdown, I quickly realized there were some tools missing that could greatly improve the writing process. A few missing items were:

  1. Knowing if there are any unreachable sections that have been orphaned in the writing process.
  2. Being able to see all the branches within a book.
  3. Knowing each branch is coherent by having an easy way to read through them.

“Never fear,” I say to myself, “I can just write some code to parse the markdown files and pluck out the paths. This will be easy.”

As a quick reminder, the format for a single section looks something like this:

1
2
3
4
5
6
7
# Something isn't right here. {#intro}

You hear a phone ringing.

- [pick up phone](#phone)
- [do not answer](#ignore-phone)
- [set yourself on fire](#fire)

(Headers specify new sections starting and have some anchor. Links direct you to new sections.)

There are plenty of ways to slurp in a story file and parse it. You could write a naive line-by-line loop that breaks it into sections based on the presence of a header and then parse the links within sections with substring matching. You could write some complicated regular expression because we all know how much fun regular expressions can become. Or you could do something saner like write a parsing expression grammar (hereafter PEG).

Why a PEG?

Generally, a regex makes for a beautiful collection of cryptic ascii art that you’ll either comment-to-death or be confused by when you stumble across it weeks or months later. PEGs take a different approach and instead seek define “a formal language in terms of a set of rules for recognizing strings in the language.” Because they’re a set of rules, you can slowly TDD your way up from parsing a single phrase to parsing an entire document (or at least the parts you care about).

(It is worth mentioning that because the format here is pretty trivial, either the naive line-by-line solution or a regex is fine. PEGs are without a doubt the right choice IMHO for complicated grammars.)

Show me some code

We’ll be using Parslet to write our PEG. Parslet provides a succinct syntax and exponentially better error messages than other competing ruby PEGs (parse_with_debug is my friend). My biggest complaint about Parslet is that the documentation was occasionally lacking, but it only slowed things down a bit – and there’s an IRC channel and mailing list.

Let’s start off simple, just parsing the links out of a single section of markdown. Being a TDD’er, we’ll write a few simple tests first (in MiniTest::Spec):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
describe LinkParser do
  def parse(input)
    LinkParser.new.parse(input)
  end

  it "can match a single link" do
    parsed = parse("[some link name](#some-href)").first

    assert_equal "some-href",
      parsed[:id]
  end

  it "can match a single link surrounded by content" do
    parsed = parse("
      hey there [some link name](#some-href)
      some content
    ").first

    assert_equal "some-href",
      parsed[:id]
  end

  it "can match a multiple links surrounded by content" do
    parsed = parse("
      hey there [some link name](#some-href)
      some content with a link [another](#new-href) and [another still](#last) ok?
    ")

    assert_equal ["some-href", "new-href", "last"],
      parsed.map{|s| s[:id].to_s}
  end
end

And the working implementation of LinkParser:

1
2
3
4
5
6
7
8
9
10
11
class LinkParser < Parslet::Parser
  rule(:link_text) { str("[") >> (str(']').absent? >> any).repeat >> str(']') }
  rule(:link_href) {
      str('(#') >> (str(')').absent? >> any).repeat.as(:id) >> str(')')
  }
  rule(:link)      { link_text >> link_href }
  rule(:non_link)  { (link.absent? >> any).repeat }
  rule(:content)   { (non_link >> link >> non_link).repeat }

  root(:content)
end

“Foul,” you cry, “this is much more complicated than a regular expression!” And I reply “Yes, but it is also more intelligible long-term as you build upon it.” You don’t look completely satisfied, but you’ll continue reading.

It is worth noting that everything has a name:

  • link_text encompasses everything between the two brackets in the markdown link.
  • link_href is the content within the parens. Because we are specifically linking only to anchors, we also include the # and then we’ll name the id we’re linking to via as.
  • link is just link_text + link_href
  • non_link is anything that isn’t a link. It could be other markdown or plain text. It may or may not actually contain any characters at all.
  • content is the whole markdown content. We can see it is made up of some number of the following: non_link + link + non_link

We’ve specified that “content” is our root so the parser starts there.

The Scratch: Adding the 3 missing features

Now we have an easy way to extract links from sections within a story. We’ll be able to leverage this to map the branches and solve all three problems.

But in order to break the larger story into sections we’ll need to write a StoryParser which can parse an entire story file (for an example file, see the previous post). Again, this was TDD’ed, but we’ll cut to the chase:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class StoryParser < Parslet::Parser
  rule(:space) { match('\s').repeat }
  rule(:newline) { match('\n') }

  rule(:heading) { match('^#') >> space.maybe >> (match['\n{'].absent? >> any).repeat.as(:heading) >> id.maybe }
  rule(:id)      { str('{#') >> (str('}').absent? >> any).repeat.as(:id) >> str('}') }
  rule(:content) { ((id | heading).absent? >> any).repeat }
  rule(:section) { (heading >> space.maybe >> content.as(:content) >> space.maybe).as(:section) }

  rule(:tile_block) { (str('%') >> (newline.absent? >> any).repeat >> newline).repeat }

  rule(:story) { space.maybe >> tile_block.maybe >> space.maybe >> section.repeat }

  root(:story)
end

Now we can parse out each section’s heading text, id, and content into a tree that looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
[
  {:section=>{
    :heading=>"Something isn't right here. "@51,
    :id=>"intro"@81,
    :content=>"You hear a phone ringing.\n\n- [pick up phone](#phone)..."@89}
  },
  {:section=>{
    :heading=>"You pick up the phone... "@210,
    :id=>"phone"@237,
    :content=>"It is your grandmother. You die.\n\n- [start over](#intro)"@245}
  },
  ...
]

“That’s well and good,” you say, “but how do we turn that into something useful?”

Enter Parslet’s Transform class (and exit your remaining skepticism). Parslet::Transform takes a tree and lets you convert it into whatever you want. The following code takes a section tree from above, cleans up some whitespace, and then returns an instantiated Section class based on the input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class SectionTransformer < Parslet::Transform
  rule(section: subtree(:hash)) {
    hash[:content] = hash[:content].to_s.strip
    hash[:heading] = hash[:heading].to_s.strip

    if hash[:id].to_s.empty?
      hash.delete(:id)
    else
      hash[:id] = hash[:id].to_s
    end

    Section.new(hash)
  }
end

Example of an instantiated Section:

1
2
3
4
5
6
p SectionTransformer.new.apply(tree[0])
# <Section:0x007fd6e5853298
#  @content="You hear a phone ringing.\n\n- [pick up phone](#phone)\n- [do not answer](#ignore-phone)\n- [set yourself on fire](#fire)",
#  @heading="Something isn't right here.",
#  @id="intro",
#  @links=["phone", "ignore-phone", "fire"]>

So now we have the building blocks for parsing a story into sections and then our Section class internally uses the LinkParser from above to determine where the section branches outward.

Let’s finish this by encapsulating the entire story in a Story class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class Story
  attr_reader :sections

  def initialize(file)
    @sections = parse_file(file)
  end

  def branches
    @_branches ||= BranchCruncher.new(@sections).traverse
  end

  def reachable
    branches.flatten.uniq
  end

  def unreachable
    @sections.map(&:id) - reachable
  end

  def split!(path)
    branches.each do |branch|
      File.open(path + branch.join('-') + '.md', 'w') do |f|
        branch.each do |id|
          section = sections.detect{|s| s.id == id}
          f.puts "# #{section.heading} {##{section.id}}\n"
          f.puts section.content
          f.puts "\n\n"
        end
      end
    end
  end

  private

  def parse_file(file)
    SectionTransformer.new.apply(StoryParser.new.parse(file.read))
  end
end

A few notes:

  • You instantiate the Story class with a File object pointing to your story.
  • It parses out the sections
  • Then you can call methods to fill in the missing pieces of functionality we identified at the beginning of this post.
1
2
3
4
5
6
7
8
9
10
11
12
13
# Which sections are orphaned?
p story.unreachable
# => ['some-unreachable-page-id']

# What branches are there in the book?
p story.branches
# => [ ["intro", "investigate", "help"], ["intro", "investigate", "rescue", "wake-up"], ["intro", "investigate", "grounded"], ["intro", "grounded"] ]

# Let me read each narrative branch by splitting each branch into files
story.split!('/tmp/')
# creates files in /tmp/ folder named for each section in a branch
# e.g. intro-investigate-help.md
# You can read through each branch and ensure you've maintained a cohesive narrative.

If you made it this far, you deserve a cookie and my undying affection. I’m all out of cookies and any I had would be gluten-free anyway, so how about I just link you to the example code instead and we call it even?

Here’s the cyoa-parser on github. It includes a hilariously bad speed-story I wrote for my son when he insisted on a CYOA bedtime story 10 minutes before bed.

If you’d like to learn more about Parslet from someone who knows it better than me, check out Jason Garber’s Wicked Good Ruby talk.

Writing Hypertext Fiction in Markdown

Remember Choose Your Own Adventure books? I fondly remember finding new ways to get myself killed as I explored Aztec ruins or fought off aliens. Death or adventure waited just a few pages away and I was the one calling all the shots.

Introducing my son to Hypertext Fiction has rekindled my interest. I wondered how difficult it would be to throw something together to let me easily write CYOA-style books my kid could read on a kindle. I love markdown, so a toolchain built around it was definitely in order.

As it turns out, Pandoc fits the bill perfectly. You can write a story in markdown and easily export it to EPUB. From there you’re just a quick step through ebook-convert (via calibre’s commandline tools) to a well-formed .mobi file that reads beautifully on a kindle.

Here’s a quick example markdown story:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
% You're probably going to die.
% Jeffrey Chupp

# Something isn't right here. {#intro}

You hear a phone ringing.

- [pick up phone](#phone)
- [do not answer](#ignore-phone)
- [set yourself on fire](#fire)

# You pick up the phone... {#phone}

It is your grandmother. You die.

- [start over](#intro)

# You ignore the phone... {#ignore-phone}

It was your grandmother. You die.

- [start over](#intro)

# You set yourself on fire... {#fire}

Strangely, you don't die. Guess you better start getting ready for school.

- [pick up backpack and head out](#backpack)
- [decide to skip school](#skip)

# You decide to skip school {#skip}

A wild herd of dinosaurs bust in and kill you. Guess you'll never get to tell your friends about how you're immune to flame... or that you met live dinosaurs :(

- [start over](#intro)

# Going to school {#backpack}

You're on your way to school when a meteor lands on you, killing you instantly.

- [start over](#intro)

From the top, we have percent signs before the title and publishing date which Pandoc uses for the title page.

Then each chapter/section begins with an h1 header which has an id specified. This id is what we’ll use in our links to let a reader choose where to go next.

If you don’t specify a link, Pandoc will dasherize your header text, but it is probably easier to be specific since you need to reference it in your link choices anyway.

Save that as story.md and run the following to get your epub and mobi versions:

pandoc -o story.epub story.md && /usr/bin/ebook-convert story.epub story.mobi

BONUS: ebook-convert even complains if one of your links points to an invalid destination.

Here’s a preview as seen in Kindle Previewer

And here are the generated EPUB and .mobi files and the markdown source file.

Now, get writing!

A Proper API Proxy Written in Go

A little over a month ago, I blogged about a API proxy written in Go. This post contained a functioning but incredibly naive (not to mention unidiomatic) piece of Go code intended to allow proxying API requests while hiding your API keys. Here’s an updated version that makes better use of the Go standard library and works using layers like Ruby’s middleware (for more on this topic, see the excellent article here). It also improves upon the original in that it will work with all HTTP verbs.

When writing the first version, I tried using httputil.NewSingleHostReverseProxy since the name sounds like exactly what I was trying to do. There was an important piece missing by default, though, which made the library seem mysteriously broken. Being a newbie in a hurry, I went with the solution you can see in the previous post.

What was missing? httputil.NewSingleHostReverseProxy does not set the host of the request to the host of the destination server. If you’re proxying from foo.com to bar.com, requests will arrive at bar.com with the host of foo.com. Many webservers are configured to not serve pages if a request doesn’t appear from the same host.

Fortunately it isn’t too complicated to modify the chain to tweak the host.

1
2
3
4
5
6
func sameHost(handler http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      r.Host = r.URL.Host
      handler.ServeHTTP(w, r)
  })
}

And the usage:

1
2
3
4
5
// initialize our reverse proxy
reverseProxy := httputil.NewSingleHostReverseProxy(serverUrl)
// wrap that proxy with our sameHost function
singleHosted := sameHost(reverseProxy)
http.ListenAndServe(":5000", singleHosted)

Perfect. We’re now setting the host of the request to the host of the destination URL.

Continuing with this approach, let’s combine our secret query params with the existing request query.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func queryCombiner(handler http.Handler, addon string) http.Handler {
  // first parse the provided string to pull out the keys and values
  values, err := url.ParseQuery(addon)
  if err != nil {
      log.Fatal("addon failed to parse")
  }

  // now we apply our addon params to the existing query
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      query := r.URL.Query()

      for k, _ := range values {
          query.Add(k, values.Get(k))
      }

      r.URL.RawQuery = query.Encode()
      handler.ServeHTTP(w, r)
  })
}

And usage is similar to above. We just continue to chain together our handlers.

1
combined := queryCombiner(singleHosted, "key=value&name=bob")

Finally, we’ll need to allow CORS on our server.

1
2
3
4
5
6
7
func addCORS(handler http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      w.Header().Set("Access-Control-Allow-Origin", "*")
      w.Header().Set("Access-Control-Allow-Headers", "X-Requested-With")
      handler.ServeHTTP(w, r)
  })
}

And add that to our chain

1
2
cors := addCORS(combined)
http.ListenAndServe(":5000", cors)

The code is available on github and it runs quite well with the heroku go buildpack.

It has a couple tests. I should add some more, but I’m not totally happy with the current testing approach. Feedback is very welcome.

apis, go

A Simple API Proxy Written in Go

UPATE: see “A proper API proxy written in Go” for a better solution to this problem.

The problem:

Have you ever written a javascript app that needed to consume an API? What if the API requires you to pass your api key along in the query params? How do you hide your key?

This weekend I bumped into this issue once again. I was writing a simple app in angular to consume the last.fm api when it hit me.

This usually leaves me with two options:

  1. Decide my api key isn’t worth hiding and just embed it in the javascript.
  2. Make a call to the app server (I’m usually using Rails) that would then make the API call within the request lifecycle and return the json when the API call finishes.

Option 1 is also known as “giving up” – you don’t really want everyone to have your api key, do you? What happens when someone else starts using it to do nefarious things on your behalf or just decides to help you hit your rate limit faster?

Option 2 is safer, but now your poor app server pays the penalty of the API being slow. If the API call takes 3 seconds, your server process/thread is tied up for that time. Lame.

Imagine your rails app is built around an external API. Do you really want to spin up more and more instances to gain concurrency just to protect your key?

The solution: Move things out-of-band

For requests that could otherwise hit the api directly, your app server shouldn’t pay the penalties of keeping your key secure. So let’s move things out-of-band.

I’d been meaning to play with Go for some time but never had the right project. The implementation here was fairly simple but needed to be highly concurrent, so this felt like a good fit.

Borrowing from example Go http servers and http consumers, I came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package main

import (
  "fmt"
  "io/ioutil"
  "net/http"
  "os"
)

func errorOut(err error) {
  fmt.Printf("%s", err)
  os.Exit(1)
}

func handler(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Access-Control-Allow-Origin", "*")
  w.Header().Set("Access-Control-Allow-Headers", "X-Requested-With")

  if r.Method == "GET" {
      var newUrl string = os.Getenv("URL_ROOT") + r.URL.Path[1:] + "?" +
        r.URL.RawQuery + os.Getenv("URL_SUFFIX")

      fmt.Printf("fetching %s\n", newUrl)

      response, err := http.Get(newUrl)
      if err != nil {
          errorOut(err)
      } else {
          defer response.Body.Close()
          contents, err := ioutil.ReadAll(response.Body)
          if err != nil {
              errorOut(err)
          }
          fmt.Fprintf(w, "%s\n", contents)
      }
  }
}

The server takes incoming requests and will translate the url by substituting the provided URL_ROOT and appending the URL_SUFFIX (the api key). It fetches that foreign url and then returns the results.

So with the example config:

URL_ROOT=http://ws.audioscrobbler.com/2.0/ URL_SUFFIX=&api_key=XXXXXXXXXXXXX

A request to the go server at http://example.com/?method=user.getrecenttracks&user=violencenow&format=json would return the contents of http://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user=violencenow&format=json&api_key=XXXXXXXXXXXXX

This isn’t a solution for everything. Right now it only supports GET requests – this is probably all you’d ever want, lest someone start posting to your endpoint and doing things you don’t expect. These sorts of potentially destructive behaviors are perhaps better handled in-band where you can apply some sanity checks.

But if all you need to do is get content from an API without exposing your keys to the public, this might be a good solution for you.

Some numbers

This is very unscientific, but I setup a Go server on heroku http://sleepy-server.herokuapp.com/ that takes a request, waits 1 second, and then returns plain text.

The benchmark for that with ab -c 300 -n 600 "http://sleepy-server.herokuapp.com/"

Concurrency Level:      300
Time taken for tests:   5.046 seconds
Complete requests:      600
Failed requests:        0
Write errors:           0
Total transferred:      83400 bytes
HTML transferred:       2400 bytes
Requests per second:    118.91 [#/sec] (mean)
Time per request:       2522.907 [ms] (mean)
Time per request:       8.410 [ms] (mean, across all concurrent requests)
Transfer rate:          16.14 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28  322 534.7    107    2257
Processing:  1040 1229 223.1   1148    2640
Waiting:     1038 1228 223.0   1148    2640
Total:       1069 1552 587.1   1309    3867

Now, let’s use our api_proxy to fetch requests from that server and serve them up by setting URL_ROOT=http://sleepy-server.herokuapp.com.

And we’ll use the same benchmark command: ab -c 300 -n 600 "http://some-fake-server-name-here.herokuapp.com/"

Concurrency Level:      300
Time taken for tests:   5.285 seconds
Complete requests:      600
Failed requests:        0
Write errors:           0
Total transferred:      132000 bytes
HTML transferred:       3000 bytes
Requests per second:    113.54 [#/sec] (mean)
Time per request:       2642.282 [ms] (mean)
Time per request:       8.808 [ms] (mean, across all concurrent requests)
Transfer rate:          24.39 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28  324 550.9     75    2260
Processing:  1049 1406 325.2   1333    3012
Waiting:     1049 1405 325.1   1331    3012
Total:       1085 1730 609.4   1644    3875

Scientific or not, that’s performance I can live with. And hopefully those API endpoints aren’t quite taking a full second per request.

Unicorn Pukes Serving Large Files

Earlier today I was getting this weird unicorn error on heroku when trying to serve a retina-sized image.

ERROR -- : app error: undefined method `each' for nil:NilClass (NoMethodError)
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_response.rb:60:in `http_response_write'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:563:in `process_client'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:633:in `worker_loop'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:500:in `spawn_missing_workers'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:142:in `start'
ERROR -- : [..]/unicorn-4.6.3/bin/unicorn_rails:209:in `<top (required)>'

Weird, right? But sure enough, whenever I tried to view some-image@2x.png, everything went terribly wrong.

Googling took too long to find an answer, so I’m sharing my solution here in hopes that it helps someone else (oh, hai, google bot).

The issue is actually a bug in the version of rack-cache required by actionpack in Rails 3.2.14. Attempting to serve files larger than 1mb causes this error.

It has been fixed, but I had to require the master branch for rack-cache to resolve the problem.

Gemfile
1
2
gem "rack-cache", github: "rtomayko/rack-cache"
gem "unicorn"

No more error.

Now, the real solution is to not serve large images through unicorn on heroku. But hooking up a CDN is another problem for another time.

Letterpress Word Finder

In an attempt to start to blog more, here’s a quick follow-up post on the previous Letterpress article.

Background

As a reminder, here’s how I outlined steps in creating a Letterpress solver:

  1. Take screenshot of game and import it into solver
  2. Parse the board into a string of letters
  3. Reduce a dictionary of valid words against those characters to find playable words
  4. Optionally make recommendations of which word to play based on current board state and strategy. (i.e. don’t be naive)

We built step one (sort-of) and step two in the previous article, so let’s move on to step three.

Requirements

We want our script to fulfill the following requirements:

  1. Accept the board letters via STDIN or commandline arguments.
  2. Reduce the dictionary words against those letters.
  3. Dump out matching words (without regard to board state/strategy).

Implementation

We’ll take either an argument or read STDIN and downcase it.

1
letters = (ARGV[0] || STDIN.read).downcase

I don’t have the official Letterpress dictionary (a quick googling will get you on the right track if you insist), but every good unix-y system has a dictionary file.

$ cat /usr/share/dict/words | wc -l
235886

OK, that’s a lot of words. Let’s pull them in and downcase them too.

1
words = File.read("/usr/share/dict/words").downcase.split("\n")

Now, the only really interesting part: a method to determine if a word can be constructed from letters. I’ve shamelessly borrowed a perfectly fast solution from Stackoverflow.

1
2
3
def is_subset?(word, letters)
  !word.chars.find{|char| word.count(char) > letters.count(char)}
end

And now we reduce our words by those that match our letters

1
2
3
matching_words = words.select do |word|
  is_subset?(word, letters)
end

And there’s nothing left to do but dump them out.

1
puts matching_words.sort_by(&:length)

Here’s the entire word generating script.

And an example of using it with the board parser from the previous post:

$ ruby -r ./board_parser -e "puts BoardParser.new('light.png').tiles.join" | ruby letter.rb | tail -n 10
hermodactyl
typhlectomy
cryohydrate
polydactyle
pterodactyl
crymotherapy
hydrolyzable
acetylthymol
overthwartly
protractedly

Excellent. Of course, not all words in your system’s dictionary file may be playable, YMMV, etc.

Quick and Dirty OCR for Letterpress & Other Tile-based Games

I’ve been playing enough Letterpress lately to realize that I’m not great at it. This is super frustrating for me when this is a game that you could easily teach a computer to play.

I’m not the first person to have that thought. There are plenty of cheating programs for Letterpress (just google or search in the app store).

I haven’t investigated these solvers but in thinking about the problem, the basic approach would seem to be:

  • Take screenshot of game and import it into solver
  • Parse the board into a string of letters
  • Reduce a dictionary of valid words against those characters to find playable words
  • Optionally make recommendations of which word to play based on current board state and strategy.

I wondered how quickly I could throw something together to simply parse the game board into a string of letters. It turns out it is super easy. To get started I took a screenshot of a game in progress and downloaded it from my phone.

Read on →