A proper API proxy written in Go

2013-11-11

A little over a month ago, I blogged about a API proxy written in Go. This post contained a functioning but incredibly naive (not to mention unidiomatic) piece of Go code intended to allow proxying API requests while hiding your API keys. Here's an updated version that makes better use of the Go standard library and works using layers like Ruby's middleware (for more on this topic, see the excellent article here). It also improves upon the original in that it will work with all HTTP verbs.

When writing the first version, I tried using httputil.NewSingleHostReverseProxy since the name sounds like exactly what I was trying to do. There was an important piece missing by default, though, which made the library seem mysteriously broken. Being a newbie in a hurry, I went with the solution you can see in the previous post.

What was missing? httputil.NewSingleHostReverseProxy does not set the host of the request to the host of the destination server. If you're proxying from foo.com to bar.com, requests will arrive at bar.com with the host of foo.com. Many webservers are configured to not serve pages if a request doesn't appear from the same host.

Fortunately it isn't too complicated to modify the chain to tweak the host.

func sameHost(handler http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        r.Host = r.URL.Host
        handler.ServeHTTP(w, r)
    })
}

And the usage:

// initialize our reverse proxy
reverseProxy := httputil.NewSingleHostReverseProxy(serverUrl)
// wrap that proxy with our sameHost function
singleHosted := sameHost(reverseProxy)
http.ListenAndServe(":5000", singleHosted)

Perfect. We're now setting the host of the request to the host of the destination URL.

Continuing with this approach, let's combine our secret query params with the existing request query.

func queryCombiner(handler http.Handler, addon string) http.Handler {
    // first parse the provided string to pull out the keys and values
    values, err := url.ParseQuery(addon)
    if err != nil {
        log.Fatal("addon failed to parse")
    }

    // now we apply our addon params to the existing query
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        query := r.URL.Query()

        for k, _ := range values {
            query.Add(k, values.Get(k))
        }

        r.URL.RawQuery = query.Encode()
        handler.ServeHTTP(w, r)
    })
}

And usage is similar to above. We just continue to chain together our handlers.

combined := queryCombiner(singleHosted, "key=value&name=bob")

Finally, we'll need to allow CORS on our server.

func addCORS(handler http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Access-Control-Allow-Origin", "*")
        w.Header().Set("Access-Control-Allow-Headers", "X-Requested-With")
        handler.ServeHTTP(w, r)
    })
}

And add that to our chain

cors := addCORS(combined)
http.ListenAndServe(":5000", cors)

The code is available on github and it runs quite well with the heroku go buildpack.

It has a couple tests. I should add some more, but I'm not totally happy with the current testing approach. Feedback is very welcome.


A simple API proxy written in Go

2013-09-23

UPDATE: see "A proper API proxy written in Go" for a better solution to this problem.

The problem:

Have you ever written a javascript app that needed to consume an API? What if the API requires you to pass your api key along in the query params? How do you hide your key?

This weekend I bumped into this issue once again. I was writing a simple app in angular to consume the last.fm api when it hit me.

This usually leaves me with two options:

  1. Decide my api key isn't worth hiding and just embed it in the javascript.
  2. Make a call to the app server (I'm usually using Rails) that would then make the API call within the request lifecycle and return the json when the API call finishes.

Option 1 is also known as "giving up" - you don't really want everyone to have your api key, do you? What happens when someone else starts using it to do nefarious things on your behalf or just decides to help you hit your rate limit faster?

Option 2 is safer, but now your poor app server pays the penalty of the API being slow. If the API call takes 3 seconds, your server process/thread is tied up for that time. Lame.

Imagine your rails app is built around an external API. Do you really want to spin up more and more instances to gain concurrency just to protect your key?

The solution: Move things out-of-band

For requests that could otherwise hit the api directly, your app server shouldn't pay the penalties of keeping your key secure. So let's move things out-of-band.

I'd been meaning to play with Go for some time but never had the right project. The implementation here was fairly simple but needed to be highly concurrent, so this felt like a good fit.

Borrowing from example Go http servers and http consumers, I came up with this:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)

func errorOut(err error) {
    fmt.Printf("%s", err)
    os.Exit(1)
}

func handler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Access-Control-Allow-Origin", "*")
    w.Header().Set("Access-Control-Allow-Headers", "X-Requested-With")

    if r.Method == "GET" {
        var newUrl string = os.Getenv("URL_ROOT") + r.URL.Path[1:] + "?" +
        r.URL.RawQuery + os.Getenv("URL_SUFFIX")

        fmt.Printf("fetching %s\n", newUrl)

        response, err := http.Get(newUrl)
        if err != nil {
            errorOut(err)
        } else {
            defer response.Body.Close()
            contents, err := ioutil.ReadAll(response.Body)
            if err != nil {
                errorOut(err)
            }
            fmt.Fprintf(w, "%s\n", contents)
        }
    }
}

The server takes incoming requests and will translate the url by substituting the provided URL_ROOT and appending the URL_SUFFIX (the api key). It fetches that foreign url and then returns the results.

So with the example config:

URL_ROOT=http://ws.audioscrobbler.com/2.0/ URL_SUFFIX=&api_key=XXXXXXXXXXXXX

A request to the go server at http://example.com/?method=user.getrecenttracks&user=violencenow&format=json would return the contents of http://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user=violencenow&format=json&api_key=XXXXXXXXXXXXX

This isn't a solution for everything. Right now it only supports GET requests - this is probably all you'd ever want, lest someone start posting to your endpoint and doing things you don't expect. These sorts of potentially destructive behaviors are perhaps better handled in-band where you can apply some sanity checks.

But if all you need to do is get content from an API without exposing your keys to the public, this might be a good solution for you.

Some numbers

This is very unscientific, but I setup a Go server on heroku http://sleepy-server.herokuapp.com/ that takes a request, waits 1 second, and then returns plain text.

The benchmark for that with ab -c 300 -n 600 "http://sleepy-server.herokuapp.com/"

Concurrency Level:      300
Time taken for tests:   5.046 seconds
Complete requests:      600
Failed requests:        0
Write errors:           0
Total transferred:      83400 bytes
HTML transferred:       2400 bytes
Requests per second:    118.91 [#/sec] (mean)
Time per request:       2522.907 [ms] (mean)
Time per request:       8.410 [ms] (mean, across all concurrent requests)
Transfer rate:          16.14 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28  322 534.7    107    2257
Processing:  1040 1229 223.1   1148    2640
Waiting:     1038 1228 223.0   1148    2640
Total:       1069 1552 587.1   1309    3867

Now, let's use our api_proxy to fetch requests from that server and serve them up by setting URL_ROOT=http://sleepy-server.herokuapp.com.

And we'll use the same benchmark command: ab -c 300 -n 600 "http://some-fake-server-name-here.herokuapp.com/"

Concurrency Level:      300
Time taken for tests:   5.285 seconds
Complete requests:      600
Failed requests:        0
Write errors:           0
Total transferred:      132000 bytes
HTML transferred:       3000 bytes
Requests per second:    113.54 [#/sec] (mean)
Time per request:       2642.282 [ms] (mean)
Time per request:       8.808 [ms] (mean, across all concurrent requests)
Transfer rate:          24.39 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28  324 550.9     75    2260
Processing:  1049 1406 325.2   1333    3012
Waiting:     1049 1405 325.1   1331    3012
Total:       1085 1730 609.4   1644    3875

Scientific or not, that's performance I can live with. And hopefully those API endpoints aren't quite taking a full second per request.


Unicorn Pukes Serving Large Files

2013-08-29

Earlier today I was getting this weird unicorn error on heroku when trying to serve a retina-sized image.

ERROR -- : app error: undefined method `each' for nil:NilClass (NoMethodError)
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_response.rb:60:in `http_response_write'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:563:in `process_client'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:633:in `worker_loop'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:500:in `spawn_missing_workers'
ERROR -- : [..]/unicorn-4.6.3/lib/unicorn/http_server.rb:142:in `start'
ERROR -- : [..]/unicorn-4.6.3/bin/unicorn_rails:209:in `<top (required)>'

Weird, right? But sure enough, whenever I tried to view some-image@2x.png, everything went terribly wrong.

Googling took too long to find an answer, so I'm sharing my solution here in hopes that it helps someone else (oh, hai, google bot).

The issue is actually a bug in the version of rack-cache required by actionpack in Rails 3.2.14. Attempting to serve files larger than 1mb causes this error.

It has been fixed, but I had to require the master branch for rack-cache to resolve the problem.

ruby Gemfile gem "rack-cache", github: "rtomayko/rack-cache" gem "unicorn"

No more error.

Now, the real solution is to not serve large images through unicorn on heroku. But hooking up a CDN is another problem for another time.

My goofy face

Hi, I'm Jeffrey Chupp.
I solve problems, often with code.