找到数组元素的所有可能组合。不是产品

find all possible combinations of elements of an array. NOT PRODUCT

我研究过这个问题,人们一直建议使用 np.meshgrid() 来查找数组的所有可能组合。但问题是 np.meshgrid() 不产生组合它产生产品(类似于 itertools.product())

在组合中,元素不重复且无序

arr = np.arange(5)
r = 3

这些是组合的样子

np.array(
    list(itertools.combinations(arr, r))
)

>>>   [[0, 1, 2],
       [0, 1, 3],
       [0, 1, 4],
       [0, 2, 3],
       [0, 2, 4],
       [0, 3, 4],
       [1, 2, 3],
       [1, 2, 4],
       [1, 3, 4],
       [2, 3, 4]]

以下不是组合

np.array(
    list(itertools.product(arr, arr, arr))
)

>>>   [[0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [0, 0, 3],
       [0, 0, 4],
       [0, 1, 0],
       [0, 1, 1],
       [0, 1, 2],
       ....,
       [4, 3, 2],
       [4, 3, 3],
       [4, 3, 4],
       [4, 4, 0],
       [4, 4, 1],
       [4, 4, 2],
       [4, 4, 3],
       [4, 4, 4]])
np.array(
    np.meshgrid(arr, arr, arr)
).transpose([2, 1, 3, 0]).reshape(-1, r)

>>>   [[0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [0, 0, 3],
       [0, 0, 4],
       [0, 1, 0],
       [0, 1, 1],
       [0, 1, 2],
       ....,
       [4, 3, 2],
       [4, 3, 3],
       [4, 3, 4],
       [4, 4, 0],
       [4, 4, 1],
       [4, 4, 2],
       [4, 4, 3],
       [4, 4, 4]])

对于 r = 2,我找到了一种查找组合的巧妙方法

np.array(
    np.triu_indices(len(arr), 1)
).T

>>>   [[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]]

但我很难找到 r > 2

的任何矢量化方法

NOTE: even if my array is not [0, 1, 2, 3, 4] i could use the above answers as indices.

如果有助于想象,

对于 r = 2,要求的答案是大小为 len(arr) 的方阵的右上三角形的索引,忽略对角线。

对于 r = 3 所需的答案是 3d 数组大小(你猜对了)的右上四面体(图像中的中间一个)的索引 len(arr),忽略 3d相当于对角线。

这与您对 3-D 的想法相似:

n = 5
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[a[:,0]<a[:,1]]

输出:

[[0 1 2]
 [0 1 3]
 [0 1 4]
 [0 2 3]
 [0 2 4]
 [0 3 4]
 [1 2 3]
 [1 2 4]
 [1 3 4]
 [2 3 4]]

对于 4-D(等等)你可以像这样扩展(不确定性能):

n = 5
r = 4
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[(a[:,0]<a[:,1]) & (a[:,1]<a[:,2])]

输出:

[[0 1 2 3]
 [0 1 2 4]
 [0 1 3 4]
 [0 2 3 4]
 [1 2 3 4]]

Itertools 似乎更快,如果这是您的目标:

def m1(n):
  r = 3
  a = np.argwhere(np.triu(np.ones((n,)*r),1))
  return a[a[:,0]<a[:,1]]

def m2(n):
  r = 3
  return combinations(np.arange(n),r)

in_ = [5,10,100,200]

看代码:

def triu_indices(n, k=0, m=None):
    return nonzero(~tri(n, m, k=k-1, dtype=bool))

def tri(N, M=None, k=0, dtype=float):
    m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
                        arange(-k, M-k, dtype=_min_int(-k, M - k)))

所以对于正方形:

In [75]: np.greater_equal.outer(np.arange(5),np.arange(5))                                           
Out[75]: 
array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])
In [76]: np.argwhere(~np.greater_equal.outer(np.arange(5),np.arange(5)))                             
Out[76]: 
array([[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]])

或使用ix_创建广播范围:

In [77]: I,J = np.ix_(np.arange(5),np.arange(5))                                                     
In [78]: I,J                                                                                         
Out[78]: 
(array([[0],
        [1],
        [2],
        [3],
        [4]]),
 array([[0, 1, 2, 3, 4]]))
In [79]: I>=J                                                                                        
Out[79]: 
array([[ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True]])
In [80]: I<J                                                                                         
Out[80]: 
array([[False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True],
       [False, False, False, False, False]])

将此概括为 3d:

In [90]: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))                                      
In [91]: np.argwhere((I<J) & (J<K))                                                                  
Out[91]: 
array([[0, 1, 2],
       [0, 1, 3],
       [0, 1, 4],
       [0, 2, 3],
       [0, 2, 4],
       [0, 3, 4],
       [1, 2, 3],
       [1, 2, 4],
       [1, 3, 4],
       [2, 3, 4]])

这样就构成了一个(5,5,5)数组,然后求True。所以它的时机不如 itertools:

也就不足为奇了
In [92]: %%timeit 
    ...: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))   
    ...: np.argwhere((I<J) & (J<K))                                                                                             
60.4 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [99]: timeit np.array(list(itertools.combinations(np.arange(5), 3)))                              
20.9 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

只是 运行 答案的一些基准

import numpy as np
import itertools
import perfplot

itertools实施

def func0(n, r):
    
    r = np.array(
        list(itertools.combinations(range(n), r))
    )
    
    return r

@hpaulj 的实现

def func1(n, r):
    
    prod = np.argwhere(np.ones(r*[n], dtype= bool))
    mask = True
    
    for i in range(r-1): mask = mask & (prod[:, i] < prod[:, i+1])
        
    return prod[mask]

@Ehsan 的实现

def func2(n, r):
    
    rngs = (np.arange(n), )*r
    indx = np.ix_(*rngs)
    mask = True
    
    for i in range(r-1): mask = mask & (indx[i] < indx[i+1])
        
    return np.argwhere(mask)

表现w.r.tn

bench_n = perfplot.bench(
    n_range= range(4, 50),
    setup  = lambda n: n,
    kernels= [
        lambda n: func0(n, 4),
        lambda n: func1(n, 4),
        lambda n: func2(n, 4)
    ],
    labels = ['func0', 'func1', 'func2']
)

bench_n.show()

表现w.r.tr

bench_r = perfplot.bench(
    n_range= range(2, 8),
    setup  = lambda r: r,
    kernels= [
        lambda r: func0(10, r),
        lambda r: func1(10, r),
        lambda r: func2(10, r)
    ],
    labels = ['func0', 'func1', 'func2']
)

bench_r.show()