从 Numpy 中的数组创建对列表的有效方法

Question

我有一个 numpy 数组 x（具有 (n,4) 形状）的整数，例如：

[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]

我想将数组转换成对数组：

[0,1]
[0,2]
[0,3]
[1,2]
...

所以第一个元素与同一子数组中的其他元素配对。我已经有了 for-loop 解决方案：

y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)

但是由于遍历 numpy 数组效率不高，我尝试了 slicing 作为解决方案。我可以对每一列进行切片：

y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]

我可以对所有列重复此操作。我的问题是：

我如何将 y[2] 附加到 y[1]，...使得形状为 (N,2)？
如果列数不小（在本例中为4），如何优雅地找到y[i]？
实现最终数组的替代方法有哪些？

Answer 1

假设 numpy 数组是

arr = np.array([[0, 1, 2, 3],
                [1, 2, 7, 9],
                [2, 1, 5, 2]])

你可以得到对数组作为

import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m) 
                    for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])

输出将是

array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 2],
       [1, 7],
       [1, 9],
       [2, 1],
       [2, 5],
       [2, 2]])

Answer 2

我能想到的最简洁的方法是：

>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0,  0,  0,  1,  2,  3],
       [ 4,  4,  4,  5,  6,  7],
       [ 8,  8,  8,  9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0,  1],
       [ 0,  2],
       [ 0,  3],
       [ 4,  5],
       [ 4,  6],
       [ 4,  7],
       [ 8,  9],
       [ 8, 10],
       [ 8, 11]])

这将制作两个数据副本，因此这不是最有效的方法。那可能是这样的：

>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0,  1],
       [ 0,  2],
       [ 0,  3],
       [ 4,  5],
       [ 4,  6],
       [ 4,  7],
       [ 8,  9],
       [ 8, 10],
       [ 8, 11]])

Answer 3

与 Jaimie 一样，我首先尝试了第一列的 repeat，然后进行整形，但后来决定制作 2 个中间数组更简单，hstack 它们：

x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])

生产

array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 2],
       [1, 7],
       [1, 9],
       [2, 1],
       [2, 5],
       [2, 2]])

可能还有其他方法可以进行这种重新排列。结果将以某种方式复制原始数据。我的猜测是，只要您使用像 reshape 和 repeat 这样的编译函数，时间差异就不会很大。

从 Numpy 中的数组创建对列表的有效方法

Efficient way of making a list of pairs from an array in Numpy

python

arrays

numpy

scipy