Running a subset of tests with PyTest

Matthew Wilkes on 2020-07-03

When working with a large codebase it can be quite frustrating to run the tests, especially if there are lots of functional tests. A test suite is only really effective as a tool if you can run it without losing your place in flow. If you ever find yourself saying “I'll make myself a cup of tea while the tests run” then your tests are probably too slow.

That's not to say that you shouldn't make tea, of course, but that the natural time to take a break is when your tests have passed (so you're done) or they've failed and you need a moment to think about what to do next.

In Chapter 2 of the book, I describe using PyTest's mark syntax to declare tests as being slow and then filtering by markers to run only the faster tests. There are many other filtering options that you can take advantage of to get the most use out of your tests. I would very much recommend that your tests are appropriately marked, but that's not always enough.

Filtering by test name

The -k flag allows you to filter your test run by the test name's. This can be the test, the class or the module name, and if you have well-named tests it's one of the more powerful tools.

The current apd.sensors codebase has 74 tests, which take 16.91 seconds to run on my main laptop. That's quite a long time, and there's a noticable slowdown around the functional tests. Running with -m "not functional" reduces that down to 0.98s, but -k retry is similar at 0.73 seconds. That is often easier to think about, as you might not care if some functional tests run, but you want the ones relevant to the logical code you're working on.

One place this is especially useful is when testing API versions. These tests are self-contained and often rather slow, so excluding the slow tests just means they don't get tested. pipenv run pytest -k V21 runs 5 tests in 4.17 seconds, not as fast as I'd like but certainly the same order of magnitude of having a sip of a drink or stretching my arms.

Failing tests

While it would be wonderful if we could just run the tests that are going to fail, that's obviously a logical impossibility. Instead, what we can do is run the tests that failed last time. The --lf flag does this, by looking at the results of the previous test run and re-running only the tests that did not pass. This is especially useful when you have introduced a failure to a large test suite and you're trying to fix the affected tests, because they may well not be grouped alike. For example, if I were to break JSONSensor.to_json_compatible(...) so that it raises a ValueError, as shown in Listing 1, then running the full test suite would result in 9 failures in 7.3 seconds.

class JSONSensor(Sensor[T_value]):
    @classmethod
    def to_json_compatible(cls, value: T_value) -> t.Any:
        raise ValueError("HI")

    @classmethod
    def from_json_compatible(cls, json_version: t.Any) -> T_value:
        return t.cast(T_value, json_version)

Listing 1. A change to JSONSensor that causes errors.

Re-running the tests with pipenv run pytest --lf also results in 9 failures, but in only 2.4 seconds. The larger the codebase, the more useful the --lf flag is, as it allows filtering out more passing tests.

As you introduce fixes to the code the number of tests that will be run with each successive attempt will reduce. If I were to replace the JSONSensor with the version in Listing 2 only 5 tests would fail and 4 would pass. The time taken to run these 9 tests spikes up to 11.2 seconds, though, as passing tests almost always take longer to complete than failing tests. This is still faster than the 16.2 seconds it takes to run the full test suite.

class JSONSensor(Sensor[T_value]):
    @classmethod
    def to_json_compatible(cls, value: T_value) -> t.Any:
        return 1

    @classmethod
    def from_json_compatible(cls, json_version: t.Any) -> T_value:
        return t.cast(T_value, json_version)

Listing 2. Returning a static value rather than raising an exception

Eventually, you'll have fixed enough code that all your failing tests pass. If you re-run with --lf after this point then the entire test-suite will re-run. It's important to make sure you've run the whole suite after a session with --lf test debugging, as you may have inadvertantly broken other tests, which you won't find out about until you do a full run. If more are failing, then the next time you use --lf it will only include the new failing tests.

Inconsistent failures

Sometimes you'll find a failure, then when you do a filtered test run you'll discover that it's no longer failing. This is invariably due to test isolation issues, perhaps a misisng teardown in one of your fixtures. In this case, there's very little you can do other than running the full test suite. However, there are techniques that help you find test isolation issues and correct them, which I'll cover in the next article.