如何为单元测试全局播种 np.random.default_rng

How do I globally seed np.random.default_rng for unit tests

The recommended way通过numpy创建随机数就是创建一个np.random.Generator这样的

import numpy as np

def foo():
    # Some more complex logic here, this is the top level method that creates the rng
    rng = np.random.default_rng()
    return rng.random()

现在假设我正在为我的代码库编写测试,我需要为 rng 播种以获得可重现的结果。

是否可以告诉 numpy 每次都使用相同的种子,无论在哪里调用 default_rng()? 这基本上是 np.random.seed() 的旧行为。 我需要这个的原因是因为我有很多这样的测试并且必须模拟 default_rng 调用来为每个测试使用种子,因为在 pytest 中你必须模拟正在使用某些东西的位置,而不是在哪里它被定义了。因此像 那样在全球范围内模拟它是行不通的。

使用 old way,可以定义一个固定装置,在 conftest.py 内自动为每个测试设置种子,如下所示:

# conftest.py

import pytest
import numpy as np

@pytest.fixture(autouse=True)
def set_random_seed():
    # seeds any random state in the tests, regardless where is is defined
    np.random.seed(0)
# test_foo.py

def test_foo():
    assert np.isclose(foo(), 0.84123412)  # That's not the right number, just an example

使用 default_rng 的新方法,这似乎不再可能。 相反,我需要在每个需要播种 rng 的测试模块中放置这样的固定装置。

# inside test_foo.py, but also every other test file

import pytest
from unittest import mock
import numpy as np


@pytest.fixture()
def seed_default_rng():
    seeded_rng = np.random.default_rng(seed=0)
    with mock.patch("module.containing.foo.np.random.default_rng") as mocked:
        mocked.return_value = seeded_rng
        yield 

def test_foo(seed_default_rng):
    assert np.isclose(foo(), 0.84123412)

我想出的最好办法是在 conftest.py 中设置一个可参数化的夹具,就像这样

# conftest.py
import pytest
from unittest import mock
import numpy as np


@pytest.fixture
def seed_default_rng(request):
    seeded_rng = np.random.default_rng(seed=0)
    mock_location = request.node.get_closest_marker("rng_location").args[0]
    with mock.patch(f"{mock_location}.np.random.default_rng") as mocked:
        mocked.return_value = seeded_rng
        yield

然后可以像这样在每个测试中使用它:

# test_foo.py
import pytest
from module.containing.foo import foo

@pytest.mark.rng_location("module.containing.foo")
def test_foo(seed_default_rng):
    assert np.isclose(foo(), 0.84123412)  # just an example number

还是没有以前那么方便,但是你只需要在每个测试中添加标记,而不是mockdefault_rng方法。

如果你想要完整的 numpy API 并保证跨 numpy 版本的稳定随机值,简短的回答是 - 你不能。

您可以使用 np.random.RandomState 模块的解决方法,但您牺牲了当前 np.random 模块的使用 - 没有好的、稳定的解决方法。

为什么 numpy.random 跨版本不稳定

从numpy v1.16开始,numpy.random.default_rng() constructs a new Generator with the default BitGenerator. But in the description of np.random.Generator,附上以下指导:

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.

因此,使用 np.random.default_rng() 将为跨平台的相同版本的 numpy 保留随机数,但不会跨版本。

NEP 0019: Random number generator policy 采用以来,情况一直如此。见摘要:

For the past decade, NumPy has had a strict backwards compatibility policy for the number stream of all of its random number distributions. Unlike other numerical components in numpy, which are usually allowed to return different when results when they are modified if they remain correct, we have obligated the random number distributions to always produce the exact same numbers in every version. The objective of our stream-compatibility guarantee was to provide exact reproducibility for simulations across numpy versions in order to promote reproducible research. However, this policy has made it very difficult to enhance any of the distributions with faster or more accurate algorithms. After a decade of experience and improvements in the surrounding ecosystem of scientific software, we believe that there are now better ways to achieve these objectives. We propose relaxing our strict stream-compatibility policy to remove the obstacles that are in the way of accepting contributions to our random number generation capabilities.

使用 pytest 进行测试的解决方法

NEP 的一部分专门用于 支持单元测试 并讨论在遗留 np.random.RandomState 模块中保持跨版本和平台的保证流兼容性。来自 “传统随机生成”:

上的 numpy 文档

The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. These all depend on Box-Muller normals or inverse CDF exponentials or gammas. This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy.

np.random.RandomState 文档提供了一个示例用法,可以将其改编为与 pytest 一起使用。重要的一点是,使用 np.random.random 和其他方法 的函数必须使用 RandomState 实例进行猴子修补:

mymod.py

的内容

import numpy as np

def myfunc():
    return np.random.random(size=3)

test_mymod.py

的内容
import pytest
import numpy as np
from numpy.random import RandomState

from mymod import myfunc

@pytest.fixture(autouse=True)
def mock_random(monkeypatch: pytest.MonkeyPatch):
    def stable_random(*args, **kwargs):
        rs = RandomState(12345)
        return rs.random(*args, **kwargs)
    
    monkeypatch.setattr('numpy.random.random', stable_random)

def test_myfunc():
    # this test will work across numpy versions
    known_result = np.array([0.929616, 0.316376, 0.183919])
    np.testing.assert_allclose(myfunc(), known_result, atol=1e-6)