用于构造不相交随机列表的内存高效版本

Question

我们必须正整数 b,n 和 b < n/2。我们想要生成两个随机的不相交列表 I1, I2，它们都包含来自 {0,1,...,n}. 的 b 个元素，下面是一个简单的方法。

def disjoint_sets(bound,n):
    import random
    I1=[];I2=[];
    L = random.sample(range(0,n+1), n+1)
    I1 = L[0:bound]
    I2 = L[bound:2*bound]
    return I1,I2

对于大 b,n（比如 b=100, n>1e7），前一个内存效率不高。因为 L 很大。我想知道是否有一种方法可以在不使用 range(0,n+1) 的情况下获得 I1,I2?

Answer 1

这是一种碰碰运气的方法，适用于您提到的范围内的数字：

import random

def rand_sample(k,n):
    #picks k distinct random integers from range(n)
    #assumes that k is much smaller than n
    choices = set()
    sample = []
    for i in range(k): #xrange(k) in Python 2
        choice = random.randint(0,n-1)
        while choice in choices:
            choice = random.randint(0,n-1)
        choices.add(choice)
        sample.append(choice)
    return sample

对于你的问题，你可以这样做：

def rand_pair(b,n):
    sample = rand_sample(2*b,n)
    return sample[:b],sample[b:]

用于构造不相交随机列表的内存高效版本

Memory efficient version for the construction disjoint random lists

python-2.x

memory-efficient

disjoint-sets