alecxe alecxe - 1 year ago 41
Python Question

Skipping falsifying examples in Hypothesis

The Story:

I'm currently in the process of unit-testing a function using

and a custom generation strategy trying to find a specific input to "break" my current solution. Here is how my test looks like:

from solution import answer

# skipping mystrategy definition - not relevant

def test(l):
assert answer(l) in {0, 1, 2}

Basically, I'm looking for possible inputs when
function does not return 0 or 1 or 2.

Here is how my current workflow looks like:

  • run the test

  • hypothesis
    finds an input that produces an

    $ pytest
    =========================================== test session starts ============================================
    ------------------------------------------------ Hypothesis ------------------------------------------------
    Falsifying example: test(l=[[0], [1]])

  • debug the function with this particular input trying to understand if this input/output is a legitimate one and the function worked correctly

The Question:

How can I skip this falsifying generated example (
[[0], [1]]
in this case) and ask
to generate me a different one?

The Question can also be interpreted: Can I ask
to not terminate if a falsifying example found and generate more falsifying examples instead?

Answer Source

There's at present no way to get Hypothesis to keep trying after it finds a failure (it might happen at some point, but it's not really clear what the right behaviour for this should be and it hasn't been a priority), but you can get it to ignore specific classes of failure using the assume functionality.

e.g. you could skip this example with:

def test(l):
    assume(l != [[0], [1]])
    assert answer(l) in {0, 1, 2}

Hypothesis will skip any examples where you call assume with a False argument and not count them towards your budget of examples it runs.

You'll probably find that this just results in trivial variations on the example, but you can pass more complex expressions to assume to ignore classes of examples.

What's your actual use case here? The normal intended usage pattern her would be to fix the bug that causes Hypothesis to fail and let it find new bugs that way. I know this isn't always practical, but I'm interested in why.