如何创建索引组合（n 中的 k）作为 numpy 的稀疏位掩码

Question

对于numpy我怎样才能有效地创建

an array/matrix 表示所有组合的列表（n 个中的 k 个）作为 k 个索引的列表。形状将是 (binomial(n, k), k).
稀疏 array/matrix 将此组合表示为长度为 n 的位掩码。（因此将上述索引扩展为位掩码。）形状为 (binomial(n, k), n).

我需要用大的 n（可能还有小的 k）来做这件事。所以算法应该是

节省时间（例如，也许在填充之前立即space分配完整的结果？）
space 高效（例如稀疏位掩码）

非常感谢您的帮助。

Answer 1

假设爆炸不是那么糟糕（如上面评论中所述），您可以试试这个。它非常矢量化并且应该很快（对于可以处理的情况）。

编辑： 我假设您对基于 scipy.sparse 的输出感兴趣。也许你不是。

代码

import itertools
import numpy as np
import scipy.sparse as sp

def combs(a, r):
    """
    Return successive r-length combinations of elements in the array a.
    Should produce the same output as array(list(combinations(a, r))), but
    faster.
    """
    a = np.asarray(a)
    dt = np.dtype([('', a.dtype)]*r)
    b = np.fromiter(itertools.combinations(a, r), dt)
    b_ = b.view(a.dtype).reshape(-1, r)
    return b_

def sparse_combs(k, n):
    combs_ = combs(np.arange(n), k)
    n_bin = combs_.shape[0]

    spmat = sp.coo_matrix(( np.ones(n_bin*k),
                            (np.repeat(np.arange(n_bin), k),
                             combs_.ravel()) ),
                            shape=(n_bin, n))
    return spmat


print('dense')
print(combs(range(4), 3))
print('sparse (dense for print)')
print(sparse_combs(3, 4).todense())

输出

dense
[[0 1 2]
 [0 1 3]
 [0 2 3]
 [1 2 3]]
sparse (dense for print)
[[ 1.  1.  1.  0.]
 [ 1.  1.  0.  1.]
 [ 1.  0.  1.  1.]
 [ 0.  1.  1.  1.]]

辅助函数 combs 我（可能）从 this question（过去的某个时候）得到的。

小的（不科学的）时间：

from time import perf_counter as pc
start = pc()
spmat = sparse_combs(5, 50)
time_used = pc() - start
print('secs: ', time_used)
print('nnzs: ', spmat.nnz)

#secs:  0.5770790778094155
#nnzs:  10593800

(3, 500)
#secs:  3.4843752405405497
#nnzs:  62125500

如何创建索引组合（n 中的 k）作为 numpy 的稀疏位掩码

How to create index combinations (k out of n) as sparse bitmasks for numpy

combinations

numpy

bitmask

scipy

sparse-matrix

代码

输出

小的（不科学的）时间：