种子 Python RNG 显示集合的非确定性行为
Seeded Python RNG showing non-deterministic behavior with sets
我在尝试 select 集合中的伪随机元素时看到不确定的行为,即使 RNG 已播种(示例代码如下所示)。为什么会发生这种情况,我是否应该期望其他 Python 数据类型显示类似的行为?
注意:我只在 Python 2.7 上测试过,但在两台不同的 Windows 计算机上可以重现。
类似问题:Python random seed not working with Genetic Programming example code 处的问题可能类似。根据我的测试,我的假设是 运行-to-运行 集合内的内存分配差异导致不同的元素被拾取到相同的 RNG 状态。
到目前为止,我还没有在 Python 文档中找到关于这种 caveat/issue 的任何关于集合或随机的提及。
示例代码(randTest 产生不同的输出 运行-to-运行):
import random
''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
def __init__(self):
self.a = set()
for n in range(2000):
self.a.add(random.random())
return
''' Main test function. '''
def randTest():
''' Seed the PRNG. '''
random.seed(0)
''' Create sets of bigSet elements, presumably many memory allocations. '''
b = set()
for n in range (2000):
b.add(bigSet())
''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
c = random.sample(b,1)[0]
print('randVal: ' + str(random.random())) #This value is always the same
print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
return
我相当确定您是对的,问题是由 set
的 运行 到 运行 内存分配差异引起的。当我将您的程序更改为使用列表而不是集合时,我得到了确定性行为:
import random
''' Class contains a large list of pseudo-random numbers. '''
class bigList:
def __init__(self):
self.a = [random.random() for n in range(2000)]
''' Main test function. '''
def randTest():
''' Seed the PRNG. '''
random.seed(0)
''' Create lists of bigList elements, presumably many memory allocations. '''
b = [bigList() for n in range(2000)]
''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
c = random.sample(b, 1)[0]
print('randVal: ' + str(random.random())) # This value is always the same
# and so is this now...
print('setSample: ' + str(random.sample(c.a, 1)[0]))
randTest()
它与可变对象的对象实例化有关。如果我创建 frozenset
的 set
它确实给出了确定性结果;
Python 2.7.11 (default, Jan 9 2016, 15:47:04)
[GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>>
如果我没记错的话,CPython 使用(可变)对象的内存位置作为它的 id 和散列的键。
所以虽然对象的内容总是相同的,但它的id会不同;
In [13]: random.seed(0)
In [14]: k = set()
In [15]: for n in range (20):
k.add(bigSet())
....:
In [16]: for x in k:
print(id(x))
....:
34856629808
34856629864
34856631936
34856630424
34856629920
34856631992
34856630480
34856629976
34856632048
34856631040
34856630536
34856632104
34856630032
34856630592
34856630088
34856632160
34856629752
34856629696
34856630760
34856630256
In [17]: random.seed(0)
In [18]: k = set()
In [19]: for n in range (20):
....: k.add(bigSet())
....:
In [20]: for x in k:
....: print(id(x))
....:
34484534800
34856629808
34484534856
34856629864
34856631936
34856630424
34856629920
34856631992
34484534968
34856629976
34856630480
34856632048
34856631040
34484535024
34484535080
34484535136
34856632216
34484534688
34484534912
34484534744
一个可能的解决方案是对冻结集进行子类化。
OrderedSet
是理想的选择。
set
和 frozenset
都不应该在这里使用,因为没有指定它们中的任何一个是有序的。另一个答案有效的事实只是实施的意外。集合是无序的,依赖于它们的顺序导致耦合到 Python 版本(和可能的机器)。
我在 Python 3.8.6 中得到的顺序与 不同(尽管两次运行之间的顺序恰好相同)。尽管生成的随机数是相同的。
要保留顺序,并因此保留基于 random
种子的确定性,您必须使用有序的数据结构,例如 OrderedSet
.
如果您没有 OrderedSet
可用,或者如果分析您的代码显示 OrderedSet
很慢,您可以使用 OrderedDict
并忽略它的值。
如果您有 Python >= 3.6,那么由于 performance optimizations.
,即使是常规的 dict
也会被订购
我在尝试 select 集合中的伪随机元素时看到不确定的行为,即使 RNG 已播种(示例代码如下所示)。为什么会发生这种情况,我是否应该期望其他 Python 数据类型显示类似的行为?
注意:我只在 Python 2.7 上测试过,但在两台不同的 Windows 计算机上可以重现。
类似问题:Python random seed not working with Genetic Programming example code 处的问题可能类似。根据我的测试,我的假设是 运行-to-运行 集合内的内存分配差异导致不同的元素被拾取到相同的 RNG 状态。
到目前为止,我还没有在 Python 文档中找到关于这种 caveat/issue 的任何关于集合或随机的提及。
示例代码(randTest 产生不同的输出 运行-to-运行):
import random
''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
def __init__(self):
self.a = set()
for n in range(2000):
self.a.add(random.random())
return
''' Main test function. '''
def randTest():
''' Seed the PRNG. '''
random.seed(0)
''' Create sets of bigSet elements, presumably many memory allocations. '''
b = set()
for n in range (2000):
b.add(bigSet())
''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
c = random.sample(b,1)[0]
print('randVal: ' + str(random.random())) #This value is always the same
print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
return
我相当确定您是对的,问题是由 set
的 运行 到 运行 内存分配差异引起的。当我将您的程序更改为使用列表而不是集合时,我得到了确定性行为:
import random
''' Class contains a large list of pseudo-random numbers. '''
class bigList:
def __init__(self):
self.a = [random.random() for n in range(2000)]
''' Main test function. '''
def randTest():
''' Seed the PRNG. '''
random.seed(0)
''' Create lists of bigList elements, presumably many memory allocations. '''
b = [bigList() for n in range(2000)]
''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
c = random.sample(b, 1)[0]
print('randVal: ' + str(random.random())) # This value is always the same
# and so is this now...
print('setSample: ' + str(random.sample(c.a, 1)[0]))
randTest()
它与可变对象的对象实例化有关。如果我创建 frozenset
的 set
它确实给出了确定性结果;
Python 2.7.11 (default, Jan 9 2016, 15:47:04)
[GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>>
如果我没记错的话,CPython 使用(可变)对象的内存位置作为它的 id 和散列的键。
所以虽然对象的内容总是相同的,但它的id会不同;
In [13]: random.seed(0)
In [14]: k = set()
In [15]: for n in range (20):
k.add(bigSet())
....:
In [16]: for x in k:
print(id(x))
....:
34856629808
34856629864
34856631936
34856630424
34856629920
34856631992
34856630480
34856629976
34856632048
34856631040
34856630536
34856632104
34856630032
34856630592
34856630088
34856632160
34856629752
34856629696
34856630760
34856630256
In [17]: random.seed(0)
In [18]: k = set()
In [19]: for n in range (20):
....: k.add(bigSet())
....:
In [20]: for x in k:
....: print(id(x))
....:
34484534800
34856629808
34484534856
34856629864
34856631936
34856630424
34856629920
34856631992
34484534968
34856629976
34856630480
34856632048
34856631040
34484535024
34484535080
34484535136
34856632216
34484534688
34484534912
34484534744
一个可能的解决方案是对冻结集进行子类化。
OrderedSet
是理想的选择。
set
和 frozenset
都不应该在这里使用,因为没有指定它们中的任何一个是有序的。另一个答案有效的事实只是实施的意外。集合是无序的,依赖于它们的顺序导致耦合到 Python 版本(和可能的机器)。
我在 Python 3.8.6 中得到的顺序与
要保留顺序,并因此保留基于 random
种子的确定性,您必须使用有序的数据结构,例如 OrderedSet
.
如果您没有 OrderedSet
可用,或者如果分析您的代码显示 OrderedSet
很慢,您可以使用 OrderedDict
并忽略它的值。
如果您有 Python >= 3.6,那么由于 performance optimizations.
,即使是常规的dict
也会被订购