我怎样才能洗牌无尽的发电机?
How can i shuffle a endless generator?
假设我有一个包含大量元素的生成器。
其实我想做一个数值表达机:
它产生所有可能的数值表达式:
oam = {'sub': [float, float], 'add':[float, float], 'ws_corr':[float, float, int], 'ws_rank':[float, int]}
rm = {int: list(map(str, [1,2,3,4])), float:[o for o in oam] + ['lp', 'ap', 'vol']}
def gen_exp(depth):
if depth > 0:
for op in oam:
for op_terms in it.product(*[rm[ret_type] for ret_type in oam[op]]):
for op_term in it.product(*[[op_term] if op_term not in oam else gen_exp(depth-1) for op_term in op_terms]):
yield '%s(%s)'%(op, ', '.join(map(str, op_term)))
print(list(gen_exp(1))) # ['sub(lp, ap)', 'sub(lp, vol)', 'sub(ap, lp)', 'sub(ap, vol)', 'sub(vol, lp)', 'sub(vol, ap)', 'add(lp, ap)', 'add(lp, vol)', 'add(ap, lp)', 'add(ap, vol)', 'add(vol, lp)', 'add(vol, ap)', 'ws_corr(lp, ap, 1)', 'ws_corr(lp, ap, 2)', 'ws_corr(lp, ap, 3)', 'ws_corr(lp, ap, 4)', 'ws_corr(lp, vol, 1)', 'ws_corr(lp, vol, 2)', 'ws_corr(lp, vol, 3)', 'ws_corr(lp, vol, 4)', 'ws_corr(ap, lp, 1)', 'ws_corr(ap, lp, 2)', 'ws_corr(ap, lp, 3)', 'ws_corr(ap, lp, 4)', 'ws_corr(ap, vol, 1)', 'ws_corr(ap, vol, 2)', 'ws_corr(ap, vol, 3)', 'ws_corr(ap, vol, 4)', 'ws_corr(vol, lp, 1)', 'ws_corr(vol, lp, 2)', 'ws_corr(vol, lp, 3)', 'ws_corr(vol, lp, 4)', 'ws_corr(vol, ap, 1)', 'ws_corr(vol, ap, 2)', 'ws_corr(vol, ap, 3)', 'ws_corr(vol, ap, 4)', 'ws_rank(lp, 1)', 'ws_rank(lp, 2)', 'ws_rank(lp, 3)', 'ws_rank(lp, 4)', 'ws_rank(ap, 1)', 'ws_rank(ap, 2)', 'ws_rank(ap, 3)', 'ws_rank(ap, 4)', 'ws_rank(vol, 1)', 'ws_rank(vol, 2)', 'ws_rank(vol, 3)', 'ws_rank(vol, 4)']
这个生成器非常庞大,当深度达到2时,expr num超过10^8
所以,它可以被认为是一个无穷无尽的发电机。
如您所见,生成器是通过类似 for-loop 的方法构建的。
所以,问题是,它太有序了。
我希望它更随机(因为相似的表达式可能有相似的结果,我希望它首先访问不同的),但需要保持完整性。
那么,我该如何洗牌呢?
一个想法是使用 islice
将生成器的初始段分割为等待给出的项目池,将其洗牌,然后从中提取 return 项目,定期添加更多项目项目到池中。
概念证明:
import itertools,random
def quasi_permute(gen,pool_size):
more = True
pool = list(itertools.islice(gen,pool_size))
random.shuffle(pool)
while(len(pool) > 0):
yield pool.pop()
if len(pool) <= pool_size//2 and more:
more_items = list(itertools.islice(gen,pool_size//2))
pool.extend(more_items)
random.shuffle(pool)
if len(more_items) < pool_size//2:
more = False
#test:
def genrange(n):
yield from range(n)
print(list(quasi_permute(genrange(10),4)))
#typical output: [2, 0, 3, 5, 4, 6, 8, 9, 1, 7]
另一个测试看起来像:
>>> p = quasi_permute(genrange(10**8),10**5)
>>> list(itertools.islice(p,10))
[79324, 80142, 42905, 67430, 26030, 31212, 77972, 71626, 61142, 45597]
假设我有一个包含大量元素的生成器。
其实我想做一个数值表达机:
它产生所有可能的数值表达式:
oam = {'sub': [float, float], 'add':[float, float], 'ws_corr':[float, float, int], 'ws_rank':[float, int]}
rm = {int: list(map(str, [1,2,3,4])), float:[o for o in oam] + ['lp', 'ap', 'vol']}
def gen_exp(depth):
if depth > 0:
for op in oam:
for op_terms in it.product(*[rm[ret_type] for ret_type in oam[op]]):
for op_term in it.product(*[[op_term] if op_term not in oam else gen_exp(depth-1) for op_term in op_terms]):
yield '%s(%s)'%(op, ', '.join(map(str, op_term)))
print(list(gen_exp(1))) # ['sub(lp, ap)', 'sub(lp, vol)', 'sub(ap, lp)', 'sub(ap, vol)', 'sub(vol, lp)', 'sub(vol, ap)', 'add(lp, ap)', 'add(lp, vol)', 'add(ap, lp)', 'add(ap, vol)', 'add(vol, lp)', 'add(vol, ap)', 'ws_corr(lp, ap, 1)', 'ws_corr(lp, ap, 2)', 'ws_corr(lp, ap, 3)', 'ws_corr(lp, ap, 4)', 'ws_corr(lp, vol, 1)', 'ws_corr(lp, vol, 2)', 'ws_corr(lp, vol, 3)', 'ws_corr(lp, vol, 4)', 'ws_corr(ap, lp, 1)', 'ws_corr(ap, lp, 2)', 'ws_corr(ap, lp, 3)', 'ws_corr(ap, lp, 4)', 'ws_corr(ap, vol, 1)', 'ws_corr(ap, vol, 2)', 'ws_corr(ap, vol, 3)', 'ws_corr(ap, vol, 4)', 'ws_corr(vol, lp, 1)', 'ws_corr(vol, lp, 2)', 'ws_corr(vol, lp, 3)', 'ws_corr(vol, lp, 4)', 'ws_corr(vol, ap, 1)', 'ws_corr(vol, ap, 2)', 'ws_corr(vol, ap, 3)', 'ws_corr(vol, ap, 4)', 'ws_rank(lp, 1)', 'ws_rank(lp, 2)', 'ws_rank(lp, 3)', 'ws_rank(lp, 4)', 'ws_rank(ap, 1)', 'ws_rank(ap, 2)', 'ws_rank(ap, 3)', 'ws_rank(ap, 4)', 'ws_rank(vol, 1)', 'ws_rank(vol, 2)', 'ws_rank(vol, 3)', 'ws_rank(vol, 4)']
这个生成器非常庞大,当深度达到2时,expr num超过10^8 所以,它可以被认为是一个无穷无尽的发电机。
如您所见,生成器是通过类似 for-loop 的方法构建的。
所以,问题是,它太有序了。
我希望它更随机(因为相似的表达式可能有相似的结果,我希望它首先访问不同的),但需要保持完整性。
那么,我该如何洗牌呢?
一个想法是使用 islice
将生成器的初始段分割为等待给出的项目池,将其洗牌,然后从中提取 return 项目,定期添加更多项目项目到池中。
概念证明:
import itertools,random
def quasi_permute(gen,pool_size):
more = True
pool = list(itertools.islice(gen,pool_size))
random.shuffle(pool)
while(len(pool) > 0):
yield pool.pop()
if len(pool) <= pool_size//2 and more:
more_items = list(itertools.islice(gen,pool_size//2))
pool.extend(more_items)
random.shuffle(pool)
if len(more_items) < pool_size//2:
more = False
#test:
def genrange(n):
yield from range(n)
print(list(quasi_permute(genrange(10),4)))
#typical output: [2, 0, 3, 5, 4, 6, 8, 9, 1, 7]
另一个测试看起来像:
>>> p = quasi_permute(genrange(10**8),10**5)
>>> list(itertools.islice(p,10))
[79324, 80142, 42905, 67430, 26030, 31212, 77972, 71626, 61142, 45597]