Untwisting a Hypertext Narrative - PEG to the Rescue!

In this post you'll learn why I think Parsing Expression Grammars are awesome and see an example of how I built one to scratch an itch.

The Itch

After spending some time writing Choose Your Own Adventure-style books in markdown, I quickly realized there were some tools missing that could greatly improve the writing process. A few missing items were:

  1. Knowing if there are any unreachable sections that have been orphaned in the writing process.
  2. Being able to see all the branches within a book.
  3. Knowing each branch is coherent by having an easy way to read through them.

"Never fear," I say to myself, "I can just write some code to parse the markdown files and pluck out the paths. This will be easy."

As a quick reminder, the format for a single section looks something like this:

# Something isn't right here. {#intro}

You hear a phone ringing.

- [pick up phone](#phone)
- [do not answer](#ignore-phone)
- [set yourself on fire](#fire)

(Headers specify new sections starting and have some anchor. Links direct you to new sections.)

There are plenty of ways to slurp in a story file and parse it. You could write a naive line-by-line loop that breaks it into sections based on the presence of a header and then parse the links within sections with substring matching. You could write some complicated regular expression because we all know how much fun regular expressions can become. Or you could do something saner like write a parsing expression grammar (hereafter PEG).

Why a PEG?

Generally, a regex makes for a beautiful collection of cryptic ascii art that you'll either comment-to-death or be confused by when you stumble across it weeks or months later. PEGs take a different approach and instead seek define "a formal language in terms of a set of rules for recognizing strings in the language." Because they're a set of rules, you can slowly TDD your way up from parsing a single phrase to parsing an entire document (or at least the parts you care about).

(It is worth mentioning that because the format here is pretty trivial, either the naive line-by-line solution or a regex is fine. PEGs are without a doubt the right choice IMHO for complicated grammars.)

Show me some code

We'll be using Parslet to write our PEG. Parslet provides a succinct syntax and exponentially better error messages than other competing ruby PEGs (parse_with_debug is my friend). My biggest complaint about Parslet is that the documentation was occasionally lacking, but it only slowed things down a bit - and there's an IRC channel and mailing list.

Let's start off simple, just parsing the links out of a single section of markdown. Being a TDD'er, we'll write a few simple tests first (in MiniTest::Spec):

describe LinkParser do
  def parse(input)
    LinkParser.new.parse(input)
  end

  it "can match a single link" do
    parsed = parse("[some link name](#some-href)").first

    assert_equal "some-href",
      parsed[:id]
  end

  it "can match a single link surrounded by content" do
    parsed = parse("
      hey there [some link name](#some-href)
      some content
    ").first

    assert_equal "some-href",
      parsed[:id]
  end

  it "can match a multiple links surrounded by content" do
    parsed = parse("
      hey there [some link name](#some-href)
      some content with a link [another](#new-href) and [another still](#last) ok?
    ")

    assert_equal ["some-href", "new-href", "last"],
      parsed.map{|s| s[:id].to_s}
  end
end

And the working implementation of LinkParser:

class LinkParser < Parslet::Parser
  rule(:link_text) { str("[") >> (str(']').absent? >> any).repeat >> str(']') }
  rule(:link_href) {
      str('(#') >> (str(')').absent? >> any).repeat.as(:id) >> str(')')
  }
  rule(:link)      { link_text >> link_href }
  rule(:non_link)  { (link.absent? >> any).repeat }
  rule(:content)   { (non_link >> link >> non_link).repeat }

  root(:content)
end

"Foul," you cry, "this is much more complicated than a regular expression!" And I reply "Yes, but it is also more intelligible long-term as you build upon it." You don't look completely satisfied, but you'll continue reading.

It is worth noting that everything has a name:

  • link_text encompasses everything between the two brackets in the markdown link.
  • link_href is the content within the parens. Because we are specifically linking only to anchors, we also include the # and then we'll name the id we're linking to via as.
  • link is just link_text + link_href
  • non_link is anything that isn't a link. It could be other markdown or plain text. It may or may not actually contain any characters at all.
  • content is the whole markdown content. We can see it is made up of some number of the following: non_link + link + non_link

We've specified that "content" is our root so the parser starts there.

The Scratch: Adding the 3 missing features

Now we have an easy way to extract links from sections within a story. We'll be able to leverage this to map the branches and solve all three problems.

But in order to break the larger story into sections we'll need to write a StoryParser which can parse an entire story file (for an example file, see the previous post). Again, this was TDD'ed, but we'll cut to the chase:

class StoryParser < Parslet::Parser
  rule(:space) { match('\s').repeat }
  rule(:newline) { match('\n') }

  rule(:heading) { match('^#') >> space.maybe >> (match['\n{'].absent? >> any).repeat.as(:heading) >> id.maybe }
  rule(:id)      { str('{#') >> (str('}').absent? >> any).repeat.as(:id) >> str('}') }
  rule(:content) { ((id | heading).absent? >> any).repeat }
  rule(:section) { (heading >> space.maybe >> content.as(:content) >> space.maybe).as(:section) }

  rule(:tile_block) { (str('%') >> (newline.absent? >> any).repeat >> newline).repeat }

  rule(:story) { space.maybe >> tile_block.maybe >> space.maybe >> section.repeat }

  root(:story)
end

Now we can parse out each section's heading text, id, and content into a tree that looks something like this:

[
  {:section=>{
    :heading=>"Something isn't right here. "@51,
    :id=>"intro"@81,
    :content=>"You hear a phone ringing.\n\n- [pick up phone](#phone)..."@89}
  },
  {:section=>{
    :heading=>"You pick up the phone... "@210,
    :id=>"phone"@237,
    :content=>"It is your grandmother. You die.\n\n- [start over](#intro)"@245}
  },
  ...
]

"That's well and good," you say, "but how do we turn that into something useful?"

Enter Parslet's Transform class (and exit your remaining skepticism). Parslet::Transform takes a tree and lets you convert it into whatever you want. The following code takes a section tree from above, cleans up some whitespace, and then returns an instantiated Section class based on the input.

class SectionTransformer < Parslet::Transform
  rule(section: subtree(:hash)) {
    hash[:content] = hash[:content].to_s.strip
    hash[:heading] = hash[:heading].to_s.strip

    if hash[:id].to_s.empty?
      hash.delete(:id)
    else
      hash[:id] = hash[:id].to_s
    end

    Section.new(hash)
  }
end

Example of an instantiated Section:

p SectionTransformer.new.apply(tree[0])
# <Section:0x007fd6e5853298
#  @content="You hear a phone ringing.\n\n- [pick up phone](#phone)\n- [do not answer](#ignore-phone)\n- [set yourself on fire](#fire)",
#  @heading="Something isn't right here.",
#  @id="intro",
#  @links=["phone", "ignore-phone", "fire"]>

So now we have the building blocks for parsing a story into sections and then our Section class internally uses the LinkParser from above to determine where the section branches outward.

Let's finish this by encapsulating the entire story in a Story class:

class Story
  attr_reader :sections

  def initialize(file)
    @sections = parse_file(file)
  end

  def branches
    @_branches ||= BranchCruncher.new(@sections).traverse
  end

  def reachable
    branches.flatten.uniq
  end

  def unreachable
    @sections.map(&:id) - reachable
  end

  def split!(path)
    branches.each do |branch|
      File.open(path + branch.join('-') + '.md', 'w') do |f|
        branch.each do |id|
          section = sections.detect{|s| s.id == id}
          f.puts "# #{section.heading} {##{section.id}}\n"
          f.puts section.content
          f.puts "\n\n"
        end
      end
    end
  end

  private

  def parse_file(file)
    SectionTransformer.new.apply(StoryParser.new.parse(file.read))
  end
end

A few notes:

  • You instantiate the Story class with a File object pointing to your story.
  • It parses out the sections
  • Then you can call methods to fill in the missing pieces of functionality we identified at the beginning of this post.
# Which sections are orphaned?
p story.unreachable
# => ['some-unreachable-page-id']

# What branches are there in the book?
p story.branches
# => [ ["intro", "investigate", "help"], ["intro", "investigate", "rescue", "wake-up"], ["intro", "investigate", "grounded"], ["intro", "grounded"] ]

# Let me read each narrative branch by splitting each branch into files
story.split!('/tmp/')
# creates files in /tmp/ folder named for each section in a branch
# e.g. intro-investigate-help.md
# You can read through each branch and ensure you've maintained a cohesive narrative.

If you made it this far, you deserve a cookie and my undying affection. I'm all out of cookies and any I had would be gluten-free anyway, so how about I just link you to the example code instead and we call it even?

Here's the cyoa-parser on github. It includes a hilariously bad speed-story I wrote for my son when he insisted on a CYOA bedtime story 10 minutes before bed.

If you'd like to learn more about Parslet from someone who knows it better than me, check out Jason Garber's Wicked Good Ruby talk.


Writing Hypertext Fiction in Markdown

2014-01-11

Remember Choose Your Own Adventure books? I fondly remember finding new ways to get myself killed as I explored Aztec ruins or fought off aliens. Death or adventure waited just a few pages away and I was the one calling all the shots.

Introducing my son to Hypertext Fiction has rekindled my interest. I wondered how difficult it would be to throw something together to let me easily write CYOA-style books my kid could read on a kindle. I love markdown, so a toolchain built around it was definitely in order.

As it turns out, Pandoc fits the bill perfectly. You can write a story in markdown and easily export it to EPUB. From there you're just a quick step through ebook-convert (via calibre's commandline tools) to a well-formed .mobi file that reads beautifully on a kindle.

Here's a quick example markdown story:

% You're probably going to die.
% Jeffrey Chupp

# Something isn't right here. {#intro}

You hear a phone ringing.

- [pick up phone](#phone)
- [do not answer](#ignore-phone)
- [set yourself on fire](#fire)

# You pick up the phone... {#phone}

It is your grandmother. You die.

- [start over](#intro)

# You ignore the phone... {#ignore-phone}

It was your grandmother. You die.

- [start over](#intro)

# You set yourself on fire... {#fire}

Strangely, you don't die. Guess you better start getting ready for school.

- [pick up backpack and head out](#backpack)
- [decide to skip school](#skip)

# You decide to skip school {#skip}

A wild herd of dinosaurs bust in and kill you. Guess you'll never get to tell your friends about how you're immune to flame... or that you met live dinosaurs :(

- [start over](#intro)

# Going to school {#backpack}

You're on your way to school when a meteor lands on you, killing you instantly.

- [start over](#intro)

From the top, we have percent signs before the title and publishing date which Pandoc uses for the title page.

Then each chapter/section begins with an h1 header which has an id specified. This id is what we'll use in our links to let a reader choose where to go next.

If you don't specify a link, Pandoc will dasherize your header text, but it is probably easier to be specific since you need to reference it in your link choices anyway.

Save that as story.md and run the following to get your epub and mobi versions:

pandoc -o story.epub story.md && /usr/bin/ebook-convert story.epub story.mobi

BONUS: ebook-convert even complains if one of your links points to an invalid destination.

Here's a preview as seen in Kindle Previewer

Example Generated mobi file

And here are the generated EPUB and .mobi files and the markdown source file.

Now, get writing!


A proper API proxy written in Go

2013-11-11

A little over a month ago, I blogged about a API proxy written in Go. This post contained a functioning but incredibly naive (not to mention unidiomatic) piece of Go code intended to allow proxying API requests while hiding your API keys. Here's an updated version that makes better use of the Go standard library and works using layers like Ruby's middleware (for more on this topic, see the excellent article here). It also improves upon the original in that it will work with all HTTP verbs.

When writing the first version, I tried using httputil.NewSingleHostReverseProxy since the name sounds like exactly what I was trying to do. There was an important piece missing by default, though, which made the library seem mysteriously broken. Being a newbie in a hurry, I went with the solution you can see in the previous post.

What was missing? httputil.NewSingleHostReverseProxy does not set the host of the request to the host of the destination server. If you're proxying from foo.com to bar.com, requests will arrive at bar.com with the host of foo.com. Many webservers are configured to not serve pages if a request doesn't appear from the same host.

Fortunately it isn't too complicated to modify the chain to tweak the host.

func sameHost(handler http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        r.Host = r.URL.Host
        handler.ServeHTTP(w, r)
    })
}

And the usage:

// initialize our reverse proxy
reverseProxy := httputil.NewSingleHostReverseProxy(serverUrl)
// wrap that proxy with our sameHost function
singleHosted := sameHost(reverseProxy)
http.ListenAndServe(":5000", singleHosted)

Perfect. We're now setting the host of the request to the host of the destination URL.

Continuing with this approach, let's combine our secret query params with the existing request query.

func queryCombiner(handler http.Handler, addon string) http.Handler {
    // first parse the provided string to pull out the keys and values
    values, err := url.ParseQuery(addon)
    if err != nil {
        log.Fatal("addon failed to parse")
    }

    // now we apply our addon params to the existing query
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        query := r.URL.Query()

        for k, _ := range values {
            query.Add(k, values.Get(k))
        }

        r.URL.RawQuery = query.Encode()
        handler.ServeHTTP(w, r)
    })
}

And usage is similar to above. We just continue to chain together our handlers.

combined := queryCombiner(singleHosted, "key=value&name=bob")

Finally, we'll need to allow CORS on our server.

func addCORS(handler http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Access-Control-Allow-Origin", "*")
        w.Header().Set("Access-Control-Allow-Headers", "X-Requested-With")
        handler.ServeHTTP(w, r)
    })
}

And add that to our chain

cors := addCORS(combined)
http.ListenAndServe(":5000", cors)

The code is available on github and it runs quite well with the heroku go buildpack.

It has a couple tests. I should add some more, but I'm not totally happy with the current testing approach. Feedback is very welcome.