假设检验库无法为这个简单的算术问题找到失败的例子

Question

我正在尝试学习 hypothesis 测试库 python，我想出了以下示例（取自 youtube 中的数学频道），这是一个非常简单的算术题：找到 x, y, w, z 使得

x*y = 21 & x+w = 8 & y*z = 9 & w - z = 5

解决方案是x = 2.1, y = 10, w = 5.9, z = 0.9。使用 hypothesis 作为声明式编程库，我希望能很快找到解决方案。

我使用假设的代码是：

from hypothesis import given
import hypothesis.strategies as st
from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    return (a*b, a+c, b*d, c-d)

@given(
    st.tuples(
        st.floats(min_value=0),
        st.floats(min_value=0),
        st.floats(min_value=0),
        st.floats(min_value=0)
    )
)
def test_f(f32_tuple):
    assert f(*f32_tuple) != (21, 8, 9, 5)

用pytest启动几次后，假设无法找到解决方案。一开始我以为是浮点数比较问题，或者搜索 space 太大了，所以我决定把它切回整数（修改元组中的最后一个数字），例如：

from hypothesis import given
import hypothesis.strategies as st
from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    return (a*b, a+c, b*d, c-d)

@given(
    st.tuples(
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
    )
)
def test_f(f32_tuple):
    assert f(*f32_tuple) != (21, 8, 9, -2)

在这里，解决方案是元组 (7, 3, 1, 3)，搜索 space“只有”10^4 个元素，所以我希望它在运行几次后找到解决方案。

这种行为让我很担心，因为库的用处在于它能够检测通常不会出现的情况。

我是不是用错了发电机？还是假设无法处理这种情况？我需要知道我是否打算在日常工作中使用它。

Answer 1

Hypothesis 使用各种启发式方法来寻找“有趣”的输入，但本质上仍然是在您的函数中抛出随机数据。 By default，假设只进行了100次尝试。但是，您可以使用像 @settings(max_examples=20000) 这样的装饰器来增加它。将此添加到您的有界整数版本足以让假设找到解决方案：

-------------------------------------------- Hypothesis --------------------------------------------
Falsifying example: test_f(
    f32_tuple=(7, 3, 1, 3),
)
===================================== short test summary info ======================================
FAILED so_arith.py::test_f - assert (21, 8, 9, -2) != (21, 8, 9, -2)

在很多实际情况下，这种随机化的方法效果很好！但不是在你这里的例子中。

这类问题最好用约束求解器来分析。 CrossHair 是一个基于求解器的系统，用于检查 Python 属性，并且可以处理无界版本。（免责声明：我是主要维护者！）这是您示例的 CrossHair 等效项：

from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    """ post: _ != (21, 8, 9, -2) """
    return (a*b, a+c, b*d, c-d)

Running crosshair check 在此文件上产生您期望的输出：

/tmp/main.py:4: error: false when calling f(a = 7.0, b = 3.0, c = 1.0, d = 3.0) (which returns (21.0, 8.0, 9.0, -2.0))

假设检验库无法为这个简单的算术问题找到失败的例子

Hypothesis testing library unable to find a failing example for this simple arithmetic problem

python

testing

hypothesis-test