alecxe alecxe - 2 months ago 12
Python Question

Generating list of lists with custom value limitations with Hypothesis

The Story:

Currently, I have a function-under-test that expects a list of lists of integers with the following rules:


  1. number of sublists (let's call it
    N
    ) can be from 1 to 50

  2. number of values inside sublists is the same for all sublists (rectangular form) and should be >= 0 and <= 5

  3. values inside sublists cannot be more than or equal to the total number of sublists. In other words, each value inside a sublist is an integer >= 0 and <
    N



Sample valid inputs:

[[0]]
[[2, 1], [2, 0], [3, 1], [1, 0]]
[[1], [0]]


Sample invalid inputs:

[[2]] # 2 is more than N=1 (total number of sublists)
[[0, 1], [2, 0]] # 2 is equal to N=2 (total number of sublists)


I'm trying to approach it with property-based-testing and generate different valid inputs with
hypothesis
library
and trying to wrap my head around
lists()
and
integers()
, but cannot make it work:


  • the condition #1 is easy to approach with
    lists()
    and
    min_size
    and
    max_size
    arguments

  • the condition #2 is covered under
    Chaining strategies together

  • the condition #3 is what I'm struggling with - cause, if we use the
    rectangle_lists
    from the above example, we don't have a reference to the length of the "parent" list inside
    integers()



The Question:

How can I limit the integer values inside sublists to be less than the total number of sublists?




Some of my attempts:

from hypothesis import given
from hypothesis.strategies import lists, integers

@given(lists(lists(integers(min_value=0, max_value=5), min_size=1, max_size=5), min_size=1, max_size=50))
def test(l):
# ...


This one was very far from meeting the requirements - list is not strictly of a rectangular form and generated integer values can go over the generated size of the list.

from hypothesis import given
from hypothesis.strategies import lists, integers

@given(integers(min_value=0, max_value=5).flatmap(lambda n: lists(lists(integers(min_value=1, max_value=5), min_size=n, max_size=n), min_size=1, max_size=50)))
def test(l):
# ...


Here, the #1 and #2 are requirements were being met, but the integer values can go larger than the size of the list - requirement #3 is not met.

Answer

There's a good general technique that is often useful when trying to solve tricky constraints like this: try to build something that looks a bit like what you want but doesn't satisfy all the constraints and then compose it with a function that modifies it (e.g. by throwing away the bad bits or patching up bits that don't quite work) to make it satisfy the constraints.

For your case, you could do something like the following:

from hypothesis.strategies import builds, lists, integers

def prune_list(ls):
    n = len(ls)
    return [
       [i for i in sublist if i < n][:5]
       for sublist in ls
    ]

limited_list_strategy = builds(
   prune_list,
   lists(lists(integers(0, 49), average_size=5), max_size=50, min_size=1)
)

In this we:

  1. Generate a list that looks roughly right (it's a list of list of integers and the integers are in the same range as all possible indices that could be valid).
  2. Prune out any invalid indices from the sublists
  3. Truncate any sublists that still have more than 5 elements in them

The result should satisfy all three conditions you needed.

The average_size parameter isn't strictly necessary but in experimenting with this I found it was a bit too prone to producing empty sublists otherwise.