TDD a CLI Caching Script - Part Five - Async Updates / Stale-While-Revalidate

2020-02-01

This is part five in a series about writing a general-purpose script to cache CLI output. In this post, we'll be write about async updates and stale-while-revalidate.

Async Updates

There's a neat HTTP header stale-while-revalidate that allows for a server to return content that the server knows is stale while updating the content in the background. Instead of blocking while the content is updated, the server can return the stale response immediately and start doing the work to provide a fresh response next time. This can be helpful for situations where we're willing to sacrifice freshness for responsiveness.

We won't try to mimic all of the HTTP stale-while-revalidate concepts, but we can provide a simple --stale-while-revalidate SECONDS option to our script to allow serving content past the TTL while we refresh in the background.

Here's the rules for our --stale-while-revalidate:

If there's cached content
- And we are inside the TTL
  - We serve the still-fresh content and do not trigger a background update.
- And the TTL has expired but we're still in the stale-while-revalidate duration
  - We immediately return the stale content.
  - We trigger a background update.
- And the TTL has expired and we're outside the stale-while-revalidate duration
  - We fall back to the synchronous behavior.
If there's no cached content
- We keep to the original synchronous behavior.

Let's work through the test cases.

@test "--stale-while-revalidate does not trigger a background update if we're in the TTL" {
  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 1
  [ "$status" -eq 0 ]
  [ "$output" = "1" ]
  [ -f "$CACHE_DIR$TEST_KEY" ]

  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 2
  [ "$status" -eq 0 ]
  [ "$output" = "1" ]

  # The value has _not_ been updated in the background
  [ "$(cat "$CACHE_DIR$TEST_KEY")" = "1" ]
}

This fails because we're not parsing the option yet. We'll add this content to our cache script's options case statement:

        --stale-while-revalidate)
            stale_while_revalidate="$2"
            shift # drop the key
            shift # drop the value
            ;;

Now that test passes since it is testing existing behavior. We move on to our next test:

@test "--stale-while-revalidate triggers a background update if we're outside the TTL but inside the SWR seconds" {
  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 1
  [ "$status" -eq 0 ]
  [ "$output" = "1" ]
  [ -f "$CACHE_DIR$TEST_KEY" ]

  sleep 1

  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 2
  [ "$status" -eq 0 ]
  [ "$output" = "1" ]

  # The value _has_ been updated in the background
  [ "$(cat "$CACHE_DIR$TEST_KEY")" = "2" ]

  # and now the updated value is used
  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 3
  [ "$status" -eq 0 ]
  [ "$output" = "2" ]
}

On the second run, we're expecting to be outside our TTL but still inside our stale-while-revalidate (SWR for short) seconds. So we should return the cached output from the first run on the second run, but also trigger a background update.

This test fails on the second [ "$output" = "1" ] assertion because we are not yet considering the SWR.

Considering the SWR isn't too tricky. We'll modify our initial script to add

    if [ $remaining_time -lt $((ttl + stale_while_revalidate)) ]; then
        return 0
    fi

after our existing check

    if [ $remaining_time -lt "$ttl" ]; then
        return 0
    fi

This change makes the previous assertion pass. Now we're failing on the [ "$(cat "$CACHE_DIR$TEST_KEY")" = "2" ] assertion because we're not triggering the background update for our cached content.

This part required some real thought to solve. We would like to call our script with the original script arguments in the background, something like:

${BASH_SOURCE[0]} $original_args &

${BASH_SOURCE[0]} is the current script. We'll add original_args=$* to the top of the file (after set -e) to preserve them before all the option-parsing shift-ing. The trailing & backgrounds the process.

That's close... We need to make sure we're not opting back into the same --stale-while-revalidate options or we'll end up calling ourself in a loop until the SWR expired. So we want the arguments except the SWR, right? That would work, but preserving/removing arguments could get tricky as time goes on. It might be easier to add a new option to opt-out of the SWR behavior and always treat the content as stale.

${BASH_SOURCE[0]} --force-stale $original_args &

We'll add a case for --force-stale to our option parsing.

               --force-stale)
                       force_stale=1
                       shift # drop the key
                       ;;

Next we replace

if fresh; then

with

if [ -z "$force_stale" ] && fresh; then

This means we'll never try to check for freshness if we're passing --force-stale.

Now, at the bottom of our script, right before our exit $status, we add our code to conditionally update the cached content in the background:

if [ "$update_in_background" = "1" ]; then
    # We re-run the original command with the original args + --force-stale to
    # prevent the possibility of leveraging --stale-while-revalidate again.
    #
    # the & puts this in the background
    #
    # shellcheck disable=SC2086
    ${BASH_SOURCE[0]} --force-stale $original_args &
fi

(We disable the shellcheck here because we intentionally do want spreading for $original_args.)

And to set $update_in_background, we'll replace

    if [ $remaining_time -lt $((ttl + stale_while_revalidate)) ]; then
        return 0
    fi

with

    if [ $remaining_time -lt $((ttl + stale_while_revalidate)) ]; then
        update_in_background=1
        return 0
    fi

That all feels right, but the test fails on the same [ "$output" = "1" ] line. If you echo $output in the test, you'll see that the STDOUT from the backgrounded process is getting into our $output. No problem, we can send it to /dev/null and trust tee to copy it to the proper cache file.

    ${BASH_SOURCE[0]} --force-stale $original_args &

becomes

    ${BASH_SOURCE[0]} --force-stale $original_args > /dev/null &

And the tests pass!

 ✓ initial run is uncached
 ✓ works for quoted arguments
 ✓ preserves the status code of the original command
 ✓ subsequent runs are cached
 ✓ respects a TTL
 ✓ only caches 0 exit status by default
 ✓ allows specifying exit statuses to cache
 ✓ allows specifying * to allow caching all statuses
 ✓ returns the cached exit status
 ✓ documents options with --help
 ✓ stops parsing arguments after --
 ✓ parses options before and after the cache key
 ✓ stops parsing options after the command starts
 ✓ --stale-while-revalidate does not trigger a background update if we're in the TTL
 ✓ --stale-while-revalidate triggers a background update if we're outside the TTL but inside the SWR seconds

15 tests, 0 failures

Testing for the remaining rules

We can codify the remaining rules outlined above in an already-passing test.

@test "--stale-while-revalidate falls back to synchronous behavior if we're outside the TTL and SWR seconds" {
  run ./cache --stale-while-revalidate 1 --ttl 1 $TEST_KEY echo 1
  [ "$status" -eq 0 ]
  [ "$output" = "1" ]

  wait_for_second_to_pass

  run ./cache --stale-while-revalidate 1 --ttl 0 $TEST_KEY echo 2
  [ "$status" -eq 0 ]
  [ "$output" = "2" ]
}

This test covers this rule:

If there's cached content
- And the TTL has expired and we're outside the stale-while-revalidate duration
  - We fall back to the synchronous behavior.

And also implicitly covers this scenario in the first run:

If there's no cached content
- We keep to the original synchronous behavior.

SWR Forever

One last useful scenario that springs to mind is that we might allow stale content of any age while we revalidate. Right now we can parse a very large number as a proxy for infinity, but we could support * or true or similar.

9999999999 is ~317 years. I feel aversion to passing a bunch of 9's here. But is that a visceral aversion to essentially arbitrary numbers or is there really an ergonomics need here?

I don't yet know how frequent this use case would be, so I'm reluctant to make changes to the code (minimal or otherwise) to support it. Passing a large number works and I can always add additional behavior later.

Closing

We update our --help response (after updating our test for --help first, naturally) and we're done.

You might want to read the full diff for this feature. It also includes a tweak to avoid sleep calls in the tests.

In the next (and probably final) entry in this series, we'll add two new options to make our script more useful: --purge and --check

semantic art

code should say something

TDD a CLI Caching Script - Part Five - Async Updates / Stale-While-Revalidate

Async Updates

Testing for the remaining rules

SWR Forever

Closing