我怎样才能洗牌无尽的发电机?

How can i shuffle a endless generator?

假设我有一个包含大量元素的生成器。

其实我想做一个数值表达机:

它产生所有可能的数值表达式:

oam = {'sub': [float, float], 'add':[float, float], 'ws_corr':[float, float, int], 'ws_rank':[float, int]}
rm  = {int: list(map(str, [1,2,3,4])), float:[o for o in oam] + ['lp', 'ap', 'vol']}

                                                                                                                                                                            
def gen_exp(depth):
  if depth > 0:
    for op in oam:
      for op_terms in it.product(*[rm[ret_type] for ret_type in oam[op]]):
        for op_term in it.product(*[[op_term] if op_term not in oam else gen_exp(depth-1) for op_term in op_terms]):
          yield '%s(%s)'%(op, ', '.join(map(str, op_term)))

print(list(gen_exp(1)))  # ['sub(lp, ap)', 'sub(lp, vol)', 'sub(ap, lp)', 'sub(ap, vol)', 'sub(vol, lp)', 'sub(vol, ap)', 'add(lp, ap)', 'add(lp, vol)', 'add(ap, lp)', 'add(ap, vol)', 'add(vol, lp)', 'add(vol, ap)', 'ws_corr(lp, ap, 1)', 'ws_corr(lp, ap, 2)', 'ws_corr(lp, ap, 3)', 'ws_corr(lp, ap, 4)', 'ws_corr(lp, vol, 1)', 'ws_corr(lp, vol, 2)', 'ws_corr(lp, vol, 3)', 'ws_corr(lp, vol, 4)', 'ws_corr(ap, lp, 1)', 'ws_corr(ap, lp, 2)', 'ws_corr(ap, lp, 3)', 'ws_corr(ap, lp, 4)', 'ws_corr(ap, vol, 1)', 'ws_corr(ap, vol, 2)', 'ws_corr(ap, vol, 3)', 'ws_corr(ap, vol, 4)', 'ws_corr(vol, lp, 1)', 'ws_corr(vol, lp, 2)', 'ws_corr(vol, lp, 3)', 'ws_corr(vol, lp, 4)', 'ws_corr(vol, ap, 1)', 'ws_corr(vol, ap, 2)', 'ws_corr(vol, ap, 3)', 'ws_corr(vol, ap, 4)', 'ws_rank(lp, 1)', 'ws_rank(lp, 2)', 'ws_rank(lp, 3)', 'ws_rank(lp, 4)', 'ws_rank(ap, 1)', 'ws_rank(ap, 2)', 'ws_rank(ap, 3)', 'ws_rank(ap, 4)', 'ws_rank(vol, 1)', 'ws_rank(vol, 2)', 'ws_rank(vol, 3)', 'ws_rank(vol, 4)']

这个生成器非常庞大,当深度达到2时,expr num超过10^8 所以,它可以被认为是一个无穷无尽的发电机。

如您所见,生成器是通过类似 for-loop 的方法构建的。

所以,问题是,它太有序了。

我希望它更随机(因为相似的表达式可能有相似的结果,我希望它首先访问不同的),但需要保持完整性。

那么,我该如何洗牌呢?

一个想法是使用 islice 将生成器的初始段分割为等待给出的项目池,将其洗牌,然后从中提取 return 项目,定期添加更多项目项目到池中。

概念证明:

import itertools,random

def quasi_permute(gen,pool_size):
    more = True
    pool = list(itertools.islice(gen,pool_size))
    random.shuffle(pool)
    while(len(pool) > 0):
        yield pool.pop()
        if len(pool) <= pool_size//2 and more:
            more_items = list(itertools.islice(gen,pool_size//2))
            pool.extend(more_items)
            random.shuffle(pool)
            if len(more_items) < pool_size//2:
                more = False

#test:

def genrange(n):
    yield from range(n)

print(list(quasi_permute(genrange(10),4)))

#typical output: [2, 0, 3, 5, 4, 6, 8, 9, 1, 7]

另一个测试看起来像:

>>> p = quasi_permute(genrange(10**8),10**5)
>>> list(itertools.islice(p,10))
[79324, 80142, 42905, 67430, 26030, 31212, 77972, 71626, 61142, 45597]