如何使用在假设中共享维度的数组编写策略？

Question

我正在使用假设，特别是 numpy 扩展，来编写测试来升级 tensorflow 模型。

这涉及生成多个共享维度的张量，例如批量大小。比如我想做的：

batch_size = integers(min_value=1, max_value=512)
hidden_state_size = integers(min_value=1, max_value=10_000)

@given(
    arrays(dtype=float32, shape=(batch_size, integers(min_value=1, max_value=10_000)),
    arrays(dtype=float32, shape=(batch_size, hidden_state_size)),
    arrays(dtype=float32, shape=(batch_size, hidden_state_size, integers(min_value=1, max_value=10_000)),
)
def test_code(input_array, initial_state, encoder_state):
    ...

但显然这不起作用，因为 shape 需要 int 而不是 integers。

我可以使用 @composite 装饰函数来生成所有必要的张量并在测试中解压缩它们，但这需要大量难以阅读且开发缓慢的样板。

我也查看了 shared 策略，但无法让它发挥作用。

如有任何建议，我们将不胜感激，因为我认为这将是强化神经网络代码的绝佳工具。

Answer 1

您可能喜欢使用 data 策略。如果你想分享一些东西，你可以在顶层@given(...)生成它，然后在测试方法体内多次使用它。 data() 策略生成一个 data 对象，它可以通过 data.draw(<your strategy>).

从 st.integers() 或 nps.arrays() 等假设策略中“提取”

from hypothesis import strategies as st
from hypothesis.extra import numpy as nps

@given(ndim=st.integers(min_value=1, max_value=32), data=st.data())
def test_code(ndim, data):
    strategy = nps.arrays(
        dtype=np.float32,
        shape=nps.array_shapes(min_dims=ndim, max_dims=ndim),
    )
    array1 = data.draw(strategy)
    array2 = data.draw(strategy)
    ...

请注意 shape kwarg 要么采用假设策略（例如 nps.array_shapes()），要么采用特定形状（例如 10、(10,)、(3, 3, 3)， ETC）。另请注意，NumPy 数组的维度不能超过 32 个。

Answer 2

诀窍是使用 shared *并使用 tuples 策略定义形状：策略元组不是有效的形状参数，但整数元组的策略是.看起来像：

batch_size = shared(integers(min_value=1, max_value=512))
hidden_state_size = shared(integers(min_value=1, max_value=10_000))

@given(
    arrays(dtype=float32, shape=tuples(batch_size, integers(min_value=1, max_value=10_000)),
    arrays(dtype=float32, shape=tuples(batch_size, hidden_state_size)),
    arrays(dtype=float32, shape=tuples(batch_size, hidden_state_size, integers(min_value=1, max_value=10_000)),
)
def test_code(input_array, initial_state, encoder_state):
    ...

另外，我还建议大大减少最大大小 - 运行（很多）对较小数组的更多测试可能会在相同的时间长度内捕获更多错误。但在盲目应用性能建议之前，请检查 --hypothesis-show-statistics 和配置文件！

如何使用在假设中共享维度的数组编写策略？

How to write strategies with arrays that share dimensions in Hypothesis?

python

python-hypothesis

tensorflow