具有 Python 中低维数组索引的多维数组的复杂索引

Question

问题：

我有一个 4 维的 numpy 数组：
```
x = np.arange(1000).reshape(5, 10, 10, 2 )
```
如果我们打印它：
我想在第 2 轴中找到数组的 6 个最大值的 索引，但仅中的第 0 个元素最后一个轴（图中红圈）：
```
indLargest2ndAxis = np.argpartition(x[...,0], 10-6, axis=2)[...,10-6:]
```
这些索引的形状符合预期 (5,10,6)。
我想为第二个轴中的这些索引获取数组的值，但现在为最后一个轴中的第一个元素（黄色圆圈图片）。它们的形状应为 (5,10,6)。如果没有矢量化，这可以通过以下方式完成：
```
np.array([ [ [ x[i, j, k, 1] for k in indLargest2ndAxis[i,j]] for j in range(10) ] for i in range(5) ])
```
不过，我想实现它的向量化。我尝试索引：
```
x[indLargest2ndAxis, 1]
```
但我得到 IndexError: index 5 is out of bounds for axis 0 with size 5。 如何以矢量化方式管理此索引组合？

Answer 1

啊，我想我现在明白你的意思了。花式索引详细 documented here。请注意，尽管如此 - 就其全面性而言 - 这是相当沉重的事情。简而言之，花式索引允许您从源数组中获取元素（根据某些 idx）并将它们放入一个新数组中（花式索引 allways returns一份）：

source = np.array([10.5, 21, 42])
idx = np.array([0, 1, 2, 1, 1, 1, 2, 1, 0])

# this is fancy indexing
target = source[idx]

expected = np.array([10.5, 21, 42, 21, 21, 21, 42, 21, 10.5])
assert np.allclose(target, expected)

这样做的好处是您可以使用索引数组的形状来控制结果数组的形状：

source = np.array([10.5, 21, 42])
idx = np.array([[0, 1], [1, 2]])

target = source[idx]

expected = np.array([[10.5, 21], [21, 42]])
assert np.allclose(target, expected)
assert target.shape == (2,2)

如果 source 有不止一个维度，事情会变得更有趣。在这种情况下，您需要指定每个轴的索引，以便 numpy 知道要取哪些元素：

source = np.arange(4).reshape(2,2)
idxA = np.array([0, 1])
idxB = np.array([0, 1])

# this will take (0,0) and (1,1)
target = source[idxA, idxB]

expected = np.array([0, 3])
assert np.allclose(target, expected)

再次观察，target 的形状与所用索引的形状相匹配。花式索引的妙处在于，必要时会广播索引形状：

source = np.arange(4).reshape(2,2)
idxA = np.array([0, 0, 1, 1]).reshape((4,1))
idxB = np.array([0, 1]).reshape((1,2))

target = source[idxA, idxB]

expected = np.array([[0, 1],[0, 1],[2, 3],[2, 3]])
assert np.allclose(target, expected)

到这里，你就可以明白你的异常是从哪里来的了。你的 source.ndim 是 4；但是，您尝试使用 2 元组 (indLargest2ndAxis, 1) 对其进行索引。当您尝试使用 indLargest2ndAxis 索引第一个轴，使用 1 索引第二个轴，以及使用 : 索引所有其他轴时，Numpy 将对此进行解释。显然，这是行不通的。 indLargest2ndAxis 的所有值都必须在 0 和 4 之间（含），因为它们必须参考沿 x.[=45 第一个轴的位置=]

我对 x[..., indLargest2ndAxis, 1] 的建议是告诉 numpy 您希望索引 x 的最后两个轴，即您希望使用 indLargest2ndAxis 索引第三个轴，第四个轴使用 1，: 用于其他任何东西。

这将产生一个结果，因为 indLargest2ndAxis 的所有元素都在 [0, 10) 中，但会产生 (5, 10, 5, 10, 6) 的形状（这不是您想要的）。有点手波浪形，形状的第一部分 (5, 10) 来自省略号 (...)，又名。 select一切，中间部分(5, 10, 6)来自indLargest2ndAxis select沿x第三轴的元素根据indLargest2ndAxis的形状和最后一部分（您看不到，因为它被挤压）来自 selecting 索引 1 沿第四轴。

继续解决您的实际问题，您可以完全避开花哨的索引子弹并执行以下操作：

x = np.arange(1000).reshape(5, 10, 10, 2)
order = x[..., 0]
values = x[..., 1]
idx = np.argpartition(order, 4)[..., 4:]
result = np.take_along_axis(values, idx, axis=-1)

编辑：当然你也可以使用花哨的索引；然而，它更加神秘并且不能很好地适应不同的形状：

x = np.arange(1000).reshape(5, 10, 10, 2)
indLargest2ndAxis = np.argpartition(x[..., 0], 4)[..., 4:]
result = x[np.arange(5)[:, None, None], np.arange(10)[None, :, None], indLargest2ndAxis, 1]

具有 Python 中低维数组索引的多维数组的复杂索引

Complex indexing of a multidimensional array with indices of lower dimensional arrays in Python

python

indexing

numpy

multidimensional-array

numpy-slicing