A 5 minute Intro to Hypothesis
Let’s say you have a Python module called example.py
where you have a function called add_numbers
that you want to test.
# example.py
def add_numbers(a, b):
return a + b
This function adds two numbers, so among other things you might want to check that the commutative property holds for all the inputs that the function receives. You create a file called test_example.py
and start writing a simple unit test to prove it.
A simple unit test
# test_example.py
import unittest
from your_python_module.example import add_numbers
class TestAddNumbers(unittest.TestCase):
def test_add_numers_is_commutative(self):
self.assertEqual(add_numbers(1.23, 4.56), add_numbers(4.56, 1.23))
if __name__ == '__main__':
unittest.main()
That’s cool, but you actually didn’t prove that the commutative property holds in general. You have just proved that for this specific case such property holds.
You realize that the combination of 1.23
and 4.56
is a very tiny subset of the entire input space of numbers that your function can receive, so you write more tests.
A common solution: write more test cases
# test_example.py
# more tests here...
def test_add_numers_is_commutative_another_case(self):
self.assertEqual(add_numbers(0.789, 321), add_numbers(321, 0.789))
# more tests here...
Not a huge gain. You have just proved that for these other specific cases that you wrote the commutative property holds. And obviously you don’t want to write a million test cases by hand.
Maybe you have heard about fuzzing, and you want to use it to create random test cases every time you run the test.
A better solution: fuzzing and ddt
You can create a pair of random floats, so every time you run the test you have a new test case. That’s a bit dangerous though, because if you have a failure you cannot reproduce it easily.
# test_example.py
import random
import unittest
class TestAddNumbers(unittest.TestCase):
def test_add_numers_is_commutative(self):
a = random.random() * 10.0
b = random.random() * 10.0
self.assertEqual(add_numbers(a, b), add_numbers(b, a))
if __name__ == '__main__':
unittest.main()
You can also use a library called ddt to generate many random test cases.
# test_example.py
import random
import unittest
from ddt import ddt, idata, unpack
def float_pairs_generator():
num_test_cases = 100
for i in range(num_test_cases):
a = random.random() * 10.0
b = random.random() * 10.0
yield (a, b)
@ddt
class TestAddNumbers(unittest.TestCase):
@idata(float_pairs_generator())
@unpack
def test_add_floats_ddt(self, a, b):
self.assertEqual(add_numbers(a, b), add_numbers(b, a))
if __name__ == '__main__':
unittest.main()
Here the float_pairs_generator
generator function creates 100 random pairs of floats. With this trick you can multiply the number of test cases while keeping your tests easy to maintain.
That’s definitely a step in the right direction, but if you think about it we are still testing some random combinations of numbers between 0.0 and 10.0 here. Not a very extensive portion of the input domain of the function add_numbers
.
You have two options:
- find a way to generate domain objects that your function can accept. In this case the domain objects are the floats that
add_numbers
can receive. - use hypothesis
I don’t know about you, but I’m going for the second one.
The best solution: Hypothesis
Here is how you write a test that checks that the commutative property holds for a pair of floats.
# test_example.py
from hypothesis import given
from hypothesis.strategies import floats
from your_python_module.example import add_numbers
@given(a=floats(), b=floats())
def test_add_numbers(a, b):
assert add_numbers(a, b) == add_numbers(b, a)
if __name__ == '__main__':
test_add_numbers()
When I ran this test I was shocked. It failed!
WTF! How it that possible that this test fails, after I have tried 100 test cases with ddt?
Luckily with hypothesis you can increase the verbosity level of your test by using the @settings
decorator.
Let’s say you also want to test a specific test case: a == 1.23
and b == 4.56
. For this you can use the @example
decorator. This is nice because now your test provides some documentation to anyone who wants to use the add_numbers
function, and at the same time you are testing a specific case that you know about or that might be particularly hard to hit.
# test_example.py
from hypothesis import given, example, settings, Verbosity
from hypothesis.strategies import floats
from your_python_module.example import add_numbers
@settings(verbosity=Verbosity.verbose)
@example(a=1.23, b=4.56)
@given(a=floats(), b=floats())
def test_add_numbers(a, b):
assert add_numbers(a, b) == add_numbers(b, a)
if __name__ == '__main__':
test_add_numbers()
Obviously the test fails again, but this time you get more insights about it.
Trying example: test_add_numbers(a=1.23, b=4.56)
Trying example: test_add_numbers(a=0.0, b=nan)
Traceback (most recent call last):
# Traceback here...
AssertionError
Trying example: test_add_numbers(a=0.0, b=nan)
Traceback (most recent call last):
# Traceback here...
AssertionError
Trying example: test_add_numbers(a=0.0, b=1.0)
Trying example: test_add_numbers(a=0.0, b=4293918720.0)
Trying example: test_add_numbers(a=0.0, b=281406257233920.0)
Trying example: test_add_numbers(a=0.0, b=7.204000185188352e+16)
Trying example: test_add_numbers(a=0.0, b=inf)
# more test cases...
You can add @seed(247616548810050264291730850370106354271) to this test to reproduce this failure.
That’s really helpful. You can see all the test case that were successful and the ones that caused a failure. You get also a seed
that you can use to reproduce this very specific failure at a later time or on a different computer. This is so awesome.
Anyway, why does this test fail? It fails because in Python nan
and inf
are valid floats, so the function floats()
might create some test cases that have a == nan
and/or b == inf
.
Are nan
and inf
valid inputs for your application? Maybe. It depends on your application.
If you are absolutely sure that add_numbers
will never receive a nan
or a inf
as inputs, you can write a test that never generates either nan
or inf
. You just have to set allow_nan
and allow_infinity
to False
in the @given
decorator. Easy peasy.
# test_example.py
from hypothesis import given, example, settings, Verbosity
from hypothesis.strategies import floats
from your_python_module.example import add_numbers
@given(
a=floats(allow_nan=False, allow_infinity=False),
b=floats(allow_nan=False, allow_infinity=False))
def test_add_numbers(a, b):
assert add_numbers(a, b) == add_numbers(b, a)
if __name__ == '__main__':
test_add_numbers()
But what if add_numbers
could in fact receive nan
or inf
as inputs (a much more realistic assumption). In this case the test should be able to generate nan
or inf
, your function should raise specific exceptions that you will have to handle somewhere else in your application, and the test should not consider such specific exceptions as failures.
Here is how example.py
might look:
# example.py
import math
class NaNIsNotAllowed(ValueError):
pass
class InfIsNotAllowed(ValueError):
pass
def add_numbers(a, b):
if math.isnan(a) or math.isnan(b):
raise NaNIsNotAllowed('nan is not a valid input')
elif math.isinf(a) or math.isinf(b):
raise InfIsNotAllowed('inf is not a valid input')
return a + b
And here is how you write the test with hypothesis:
# test_example.py
from hypothesis import given
from hypothesis.strategies import floats
from your_python_module.example import add_numbers, NaNIsNotAllowed, InfIsNotAllowed
@given(a=floats(), b=floats())
def test_add_numbers(a, b):
try:
assert add_numbers(a, b) == add_numbers(b, a)
except (NaNIsNotAllowed, InfIsNotAllowed):
reject()
if __name__ == '__main__':
test_add_numbers()
This test will generate some nan
and inf
inputs, add_numbers
will raise either NaNIsNotAllowed
or InfIsNotAllowed
and the test will catch these exceptions and reject them as test failures (i.e. the test case will be considered a success when either NaNIsNotAllowed
or InfIsNotAllowed
occurs).
Can you really afford to reject nan
as an input value for add_numbers
? Maybe not. Let’s say your code needs to sum two samples in a time series, and one sample of the time series is missing: nan
would be a perfectly valid input for add_numbers
in such case.