从两个列表的产品中抽样的最快方法

Question

我想从两个列表的乘积中获取 n 个样本（无需替换）。如下所示，我目前正在计算整个产品，然后从中抽样，但这对于长列表来说在计算上变得笨拙。有没有办法优化这个过程？也就是说，不必计算整个产品，而是更有效地抽样？

当前的幼稚方法（错误地用替换采样，如下所述）：

from itertools import product
from random import choice

def get_sample(a,b,n):
"""return n samples from the product a and b"""
  D = list(product(a,b))
  D = [choice(D) for _ in range(n)]

  return D

Answer 1

不幸的是，您不能从迭代器中随机抽取样本。迭代器（如 product）意味着您一次只能知道 1 个值，并且您必须知道超过 1 个值才能提供随机性。

虽然 random.sample() 像这样天真的方法可以更有效：

from itertools import product
import random
def get_sample(a,b,n):
"""return n samples from the product a and b"""
  D = list(product(a,b))
  return random.sample(D, n)

Answer 2

如果你实际上不使用list你应该没问题如果你不想要一个list;

可以找到非随机样本

from itertools import product

def get_first_n_sample(a,b,n):
"""return n samples from the product a and b"""
  D = product(a,b)
  D = [D.next() for _ in range(n)] ## if you're on Python2, use xrange!
  return D

现在，如果您只想从 a 和 b 的某些组合中随机抽样，迭代器显然不是正确的方法，因此 itertools 不是，任何一个。假设 a 和 b 随机访问速度很快（例如 lists，tuples）：

from random import choice

def get_random_sample(a, b):
    return (choice(a), choice(b))

获取 n 唯一样本更复杂，但是：

from random import sample

def get_random_samples(a, b, n):
    n_prod = len(a) * len(b)
    indices = sample(range(n_prod), n)
    return [(a[idx % len(a)], b[idx // len(a)]) for idx in indices]

Answer 3

如果你想要一个带有替换的样本，按照你的代码当前的方式，你可以使用 (choice(a), choice(b):

随机获取 product(a, b) 的元素

sample = [(choice(a), choice(b)) for _ in xrange(n)]

如果您想要一个没有替换的样本，请制作一个随机索引对的样本：

sample = [(a[i // len(b)], b[i % len(b)])
          for i in random.sample(xrange(len(a)*len(b)), n)]

从两个列表的产品中抽样的最快方法

Fastest way to sample from product of two lists

python

optimization

cartesian-product

random-sample