找到数组元素的所有可能组合。不是产品
find all possible combinations of elements of an array. NOT PRODUCT
我研究过这个问题,人们一直建议使用 np.meshgrid()
来查找数组的所有可能组合。但问题是 np.meshgrid()
不产生组合它产生产品(类似于 itertools.product())
在组合中,元素不重复且无序
arr = np.arange(5)
r = 3
这些是组合的样子
np.array(
list(itertools.combinations(arr, r))
)
>>> [[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
以下不是组合
np.array(
list(itertools.product(arr, arr, arr))
)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
np.array(
np.meshgrid(arr, arr, arr)
).transpose([2, 1, 3, 0]).reshape(-1, r)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
对于 r = 2
,我找到了一种查找组合的巧妙方法
np.array(
np.triu_indices(len(arr), 1)
).T
>>> [[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]]
但我很难找到 r > 2
的任何矢量化方法
NOTE:
even if my array is not [0, 1, 2, 3, 4]
i could use the above answers as indices.
如果有助于想象,
对于 r = 2
,要求的答案是大小为 len(arr)
的方阵的右上三角形的索引,忽略对角线。
对于 r = 3
所需的答案是 3d 数组大小(你猜对了)的右上四面体(图像中的中间一个)的索引 len(arr)
,忽略 3d相当于对角线。
这与您对 3-D 的想法相似:
n = 5
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[a[:,0]<a[:,1]]
输出:
[[0 1 2]
[0 1 3]
[0 1 4]
[0 2 3]
[0 2 4]
[0 3 4]
[1 2 3]
[1 2 4]
[1 3 4]
[2 3 4]]
对于 4-D(等等)你可以像这样扩展(不确定性能):
n = 5
r = 4
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[(a[:,0]<a[:,1]) & (a[:,1]<a[:,2])]
输出:
[[0 1 2 3]
[0 1 2 4]
[0 1 3 4]
[0 2 3 4]
[1 2 3 4]]
Itertools 似乎更快,如果这是您的目标:
def m1(n):
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
return a[a[:,0]<a[:,1]]
def m2(n):
r = 3
return combinations(np.arange(n),r)
in_ = [5,10,100,200]
看代码:
def triu_indices(n, k=0, m=None):
return nonzero(~tri(n, m, k=k-1, dtype=bool))
def tri(N, M=None, k=0, dtype=float):
m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
arange(-k, M-k, dtype=_min_int(-k, M - k)))
所以对于正方形:
In [75]: np.greater_equal.outer(np.arange(5),np.arange(5))
Out[75]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [76]: np.argwhere(~np.greater_equal.outer(np.arange(5),np.arange(5)))
Out[76]:
array([[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
或使用ix_
创建广播范围:
In [77]: I,J = np.ix_(np.arange(5),np.arange(5))
In [78]: I,J
Out[78]:
(array([[0],
[1],
[2],
[3],
[4]]),
array([[0, 1, 2, 3, 4]]))
In [79]: I>=J
Out[79]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [80]: I<J
Out[80]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[False, False, False, False, True],
[False, False, False, False, False]])
将此概括为 3d:
In [90]: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
In [91]: np.argwhere((I<J) & (J<K))
Out[91]:
array([[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]])
这样就构成了一个(5,5,5)数组,然后求True。所以它的时机不如 itertools
:
也就不足为奇了
In [92]: %%timeit
...: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
...: np.argwhere((I<J) & (J<K))
60.4 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [99]: timeit np.array(list(itertools.combinations(np.arange(5), 3)))
20.9 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
只是 运行 答案的一些基准
import numpy as np
import itertools
import perfplot
itertools
实施
def func0(n, r):
r = np.array(
list(itertools.combinations(range(n), r))
)
return r
@hpaulj 的实现
def func1(n, r):
prod = np.argwhere(np.ones(r*[n], dtype= bool))
mask = True
for i in range(r-1): mask = mask & (prod[:, i] < prod[:, i+1])
return prod[mask]
@Ehsan 的实现
def func2(n, r):
rngs = (np.arange(n), )*r
indx = np.ix_(*rngs)
mask = True
for i in range(r-1): mask = mask & (indx[i] < indx[i+1])
return np.argwhere(mask)
表现w.r.tn
bench_n = perfplot.bench(
n_range= range(4, 50),
setup = lambda n: n,
kernels= [
lambda n: func0(n, 4),
lambda n: func1(n, 4),
lambda n: func2(n, 4)
],
labels = ['func0', 'func1', 'func2']
)
bench_n.show()
表现w.r.tr
bench_r = perfplot.bench(
n_range= range(2, 8),
setup = lambda r: r,
kernels= [
lambda r: func0(10, r),
lambda r: func1(10, r),
lambda r: func2(10, r)
],
labels = ['func0', 'func1', 'func2']
)
bench_r.show()
我研究过这个问题,人们一直建议使用 np.meshgrid()
来查找数组的所有可能组合。但问题是 np.meshgrid()
不产生组合它产生产品(类似于 itertools.product())
在组合中,元素不重复且无序
arr = np.arange(5)
r = 3
这些是组合的样子
np.array(
list(itertools.combinations(arr, r))
)
>>> [[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
以下不是组合
np.array(
list(itertools.product(arr, arr, arr))
)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
np.array(
np.meshgrid(arr, arr, arr)
).transpose([2, 1, 3, 0]).reshape(-1, r)
>>> [[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
....,
[4, 3, 2],
[4, 3, 3],
[4, 3, 4],
[4, 4, 0],
[4, 4, 1],
[4, 4, 2],
[4, 4, 3],
[4, 4, 4]])
对于 r = 2
,我找到了一种查找组合的巧妙方法
np.array(
np.triu_indices(len(arr), 1)
).T
>>> [[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]]
但我很难找到 r > 2
NOTE: even if my array is not
[0, 1, 2, 3, 4]
i could use the above answers as indices.
如果有助于想象,
对于 r = 2
,要求的答案是大小为 len(arr)
的方阵的右上三角形的索引,忽略对角线。
对于 r = 3
所需的答案是 3d 数组大小(你猜对了)的右上四面体(图像中的中间一个)的索引 len(arr)
,忽略 3d相当于对角线。
这与您对 3-D 的想法相似:
n = 5
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[a[:,0]<a[:,1]]
输出:
[[0 1 2]
[0 1 3]
[0 1 4]
[0 2 3]
[0 2 4]
[0 3 4]
[1 2 3]
[1 2 4]
[1 3 4]
[2 3 4]]
对于 4-D(等等)你可以像这样扩展(不确定性能):
n = 5
r = 4
a = np.argwhere(np.triu(np.ones((n,)*r),1))
a[(a[:,0]<a[:,1]) & (a[:,1]<a[:,2])]
输出:
[[0 1 2 3]
[0 1 2 4]
[0 1 3 4]
[0 2 3 4]
[1 2 3 4]]
Itertools 似乎更快,如果这是您的目标:
def m1(n):
r = 3
a = np.argwhere(np.triu(np.ones((n,)*r),1))
return a[a[:,0]<a[:,1]]
def m2(n):
r = 3
return combinations(np.arange(n),r)
in_ = [5,10,100,200]
看代码:
def triu_indices(n, k=0, m=None):
return nonzero(~tri(n, m, k=k-1, dtype=bool))
def tri(N, M=None, k=0, dtype=float):
m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
arange(-k, M-k, dtype=_min_int(-k, M - k)))
所以对于正方形:
In [75]: np.greater_equal.outer(np.arange(5),np.arange(5))
Out[75]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [76]: np.argwhere(~np.greater_equal.outer(np.arange(5),np.arange(5)))
Out[76]:
array([[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
或使用ix_
创建广播范围:
In [77]: I,J = np.ix_(np.arange(5),np.arange(5))
In [78]: I,J
Out[78]:
(array([[0],
[1],
[2],
[3],
[4]]),
array([[0, 1, 2, 3, 4]]))
In [79]: I>=J
Out[79]:
array([[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False],
[ True, True, True, True, True]])
In [80]: I<J
Out[80]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[False, False, False, False, True],
[False, False, False, False, False]])
将此概括为 3d:
In [90]: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
In [91]: np.argwhere((I<J) & (J<K))
Out[91]:
array([[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]])
这样就构成了一个(5,5,5)数组,然后求True。所以它的时机不如 itertools
:
In [92]: %%timeit
...: I,J,K = np.ix_(np.arange(5),np.arange(5),np.arange(5))
...: np.argwhere((I<J) & (J<K))
60.4 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [99]: timeit np.array(list(itertools.combinations(np.arange(5), 3)))
20.9 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
只是 运行 答案的一些基准
import numpy as np
import itertools
import perfplot
itertools
实施
def func0(n, r):
r = np.array(
list(itertools.combinations(range(n), r))
)
return r
@hpaulj 的实现
def func1(n, r):
prod = np.argwhere(np.ones(r*[n], dtype= bool))
mask = True
for i in range(r-1): mask = mask & (prod[:, i] < prod[:, i+1])
return prod[mask]
@Ehsan 的实现
def func2(n, r):
rngs = (np.arange(n), )*r
indx = np.ix_(*rngs)
mask = True
for i in range(r-1): mask = mask & (indx[i] < indx[i+1])
return np.argwhere(mask)
表现w.r.tn
bench_n = perfplot.bench(
n_range= range(4, 50),
setup = lambda n: n,
kernels= [
lambda n: func0(n, 4),
lambda n: func1(n, 4),
lambda n: func2(n, 4)
],
labels = ['func0', 'func1', 'func2']
)
bench_n.show()
表现w.r.tr
bench_r = perfplot.bench(
n_range= range(2, 8),
setup = lambda r: r,
kernels= [
lambda r: func0(10, r),
lambda r: func1(10, r),
lambda r: func2(10, r)
],
labels = ['func0', 'func1', 'func2']
)
bench_r.show()