Use Git history to suggest related tests

2019-06-20

You've started a new job (congrats!). For your first task, your PM wants you to change the default behavior of help-desk links to open in a new tab.

This is the sort of task that is either trivial or a trip down the rathole of fragile tests that depended on the original behavior.

As apps grow, two things often happen that make changes like this one slower for developers:

It becomes less obvious which tests might be impacted by a change.
The runtime of the test suite grows such that running the entire suite locally isn't palatable.

Many devs will run any seemingly relevant unit tests, any obvious integration tests, and then let CI tell them what they missed. But when CI takes minutes or tens of minutes to run, the feedback loop grows and this once seemingly simple tweak can derail your morning.

Fortunately, there's an easy way to find tests likely impacted by your change...

Git to the rescue (again)

If you're using atomic commits with Git, you have a rich history that groups files with their related tests.

There's no obvious relationship between a file named link-helper.js and your "Subscription Refund Integration Test" but if the two were changed in the same commit, that's a good hint that they might be related.

So if you make your change in link-helper.js, how can you use Git history to suggest related tests?

The naive version looks something like this

#!/usr/bin/env bash

file=$1
pattern=$2

candidates=$(
    # find commits where the file was changed
    git log --format='%H' -- $1 |
    # show file names from those commits
    xargs git show --pretty="" --name-only |
    # filter to only the provided pattern
    grep $pattern |
    # remove duplicates
    uniq
)

echo $candidates

Save that as suggest-tests somewhere in $PATH and chmod +x it.

Now you can invoke suggest-tests app/js/link-helper.js test/ and see all files with "test/" in their path that changed when app/js/link-helper.js also changed.

A more robust solution

There's a few places that the naive solution isn't ideal:

It won't follow file renames.
It returns file paths that have since been deleted.
It would be nice if the uniq preserved history order (most recently edited to least recently edited).
It would also be nice if you could set a default pattern to avoid specifying it every time.

After some thinking, googling, and false starts, here's the version I'm using today:

#!/usr/bin/env bash

function usage {
    script=$(basename $0)

    echo "$script - use Git history to suggest tests that could be relevant to the provided file"
    echo
    echo "Usage: $script file test_pattern"
    echo
    echo "       Note: test_pattern is optional if \$DEFAULT_SUGGEST_TESTS_PATTERN is set"

    if [ -z $DEFAULT_SUGGEST_TESTS_PATTERN ]; then
        echo "       (\$DEFAULT_SUGGEST_TESTS_PATTERN is unset or empty)"
    else
        echo "       (\$DEFAULT_SUGGEST_TESTS_PATTERN is set to $DEFAULT_SUGGEST_TESTS_PATTERN)"
    fi

    echo

    echo "Example:"
    echo "       $ suggest-tests some_file_name.rb _test.rb"

    echo
    echo "You might want to pipe the results into your test runner with xargs:"
    echo "       $ suggest-tests some_file_name.rb _test.rb | xargs rake test"
    exit 1
}

if [ "$#" -gt 2 ] || [ "$#" -eq 0 ] || [ $1 == "--help" ]; then
    usage
fi

file=$1
pattern=${2:-$DEFAULT_SUGGEST_TESTS_PATTERN}

if [ -z $pattern ]; then
    usage
fi

candidates=$(
    # find commits where the file was changed (following renames on the file)
    git log --follow --format='%H' -- $file |
    # show file names from those commits
    xargs git show --pretty="" --name-only |
    # get the test files from those file names
    grep $pattern |
    # uniqify the names but preserve history order
    awk '!x[$0]++'
)

# get the root in case we're called from elsewhere
git_root=$(git rev-parse --show-toplevel)

# only return candidates that still exist on disk
for candidate in $candidates
do
    if [ -f "$git_root/$candidate" ]; then
        echo $candidate
    fi
done

This solves all our issues and adds some helpful usage instructions. Also, how great is that awk trick?

Running the relevant tests

I live in Ruby + minitest world most of the time so here's an example of how I run relevant tests: suggest-tests some_file_name.rb _test.rb | xargs rake test

Vim integration with fzf

When editing a file, it can sometimes be useful to edit related test files. Here's an example Vim mapping to quickly jump to these files with fzf.

nnoremap <silent> <Leader>S :call fzf#run({
\   'source':  'suggest-tests ' . bufname('%'),
\   'sink':    'e',
\   'options': '--multi --reverse',
\   'down':    15
\ })<CR>

That uses $DEFAULT_SUGGEST_TESTS_PATTERN (which I've set locally to '^test.*_test\.rb$') but you could make a binding for various patterns as you wish.

Closing thoughts

This approach isn't perfect (since you might break a test that shares no Git history with your changed file), and CI will still catch anything you miss. This script has saved me numerous CI feedback cycles over the past year and I hope it does the same for you.

semantic art

code should say something