numpy索引解释 ndarray[(4, 2), (5, 3)]

Question

问题

请帮助理解将元组 (i, j) 编入 ndarray 的 Numpy 索引的设计决策或合理性。

背景

当索引为单个元组(4, 2)时，则(i=row,j=column)。

shape = (6, 7)
X = np.zeros(shape, dtype=int)
X[(4, 2)] = 1
X[(5, 3)] = 1
print("X is :\n{}\n".format(X))
---
X is :
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0]    <--- (4, 2)
 [0 0 0 1 0 0 0]]   <--- (5, 3)

但是，当索引是多个元组 (4, 2), (5, 3) 时，则 (i=row, j=row) for (4, 2) and (i=column, j=column ) 对于 (5, 3).

shape = (6, 7)
Y = np.zeros(shape, dtype=int)
Y[(4, 2), (5, 3)] = 1
print("Y is :\n{}\n".format(Y))
---
Y is :
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]    <--- (2, 3)
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 1 0]    <--- (4, 5)
 [0 0 0 0 0 0 0]]

It means you are constructing a 2d array R, such that R=A[B, C]. This means that the value for r_ij=a_{b_ijc_ij}.

So it means that the item located at R[0,0] is the item in A with as row index B[0,0] and as column index C[0,0]. The item R[0,1] is the item in A with row index B[0,1] and as column index C[0,1], etc.

numpy.ravel_multi_index(multi_index, dims, mode='raise', order='C')

multi_index: A tuple of integer arrays, one array for each dimension.

为什么不总是（i=行，j=列）？如果一直是(i=row, j=column)会怎么样？

已更新

有了Akshay和@DaniMesejo的回答，明白了：

X[
  (4),    # dimension 2 indices with only 1 element
  (2)     # dimension 1 indices with only 1 element
] = 1

Y[
  (4, 2, ...), # dimension 2 indices 
  (5, 3, ...)  # dimension 1 indices (dimension 0 is e.g. np.array(3) whose shape is (), in my understanding)
] = 1

Answer 1

很容易理解它是如何工作的（以及这个设计决定背后的动机）。

Numpy 将其 ndarray 存储为连续的内存块。每个元素在前一个元素之后每隔 n 个字节按顺序存储。

（图片引用自此）

所以如果你的 3D 阵列看起来像这样 -

然后在内存中存储为 -

在检索元素（或元素块）时，NumPy 会计算需要遍历多少 strides（字节）才能获得下一个元素 in that direction/axis。因此，对于上面的示例，对于 axis=2 它必须遍历 8 个字节（取决于 datatype）但是对于 axis=1 它必须遍历 8*4 个字节，并且 axis=0 它需要 8*8 字节。

考虑到这一点，让我们看看您要做什么。

print(X)
print(X.strides)

[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 1 0 0 0]]

#Strides (bytes) required to traverse in each axis.
(56, 8)

对于您的数组，要获取 axis=0 中的下一个元素，我们需要遍历 56 bytes，而对于 axis=1 中的下一个元素，我们需要 8 bytes .

当您索引 (4,2) 时，NumPy 将在 axis=0 中使用 56*4 个字节，在 axis=1 中使用 8*2 个字节来访问它。同样，如果你想访问 (4,2) 和 (5,3)，它必须访问 axis=0 中的 56*(4,5) 和 axis=1 中的 8*(2,3)。

这就是设计之所以如此的原因，因为它与 NumPy 实际上使用 strides 索引元素的方式一致。

X[(axis0_indices), (axis1_indices), ..]

X[(4, 5), (2, 3)] #(row indices), (column indices)

array([1, 1])

通过这种设计，也可以轻松扩展到更高维度的张量（例如 8 维数组）！ 如果您分别提及每个索引元组，则需要元素 * 计算维数才能获取这些元组。通过这种设计，它可以将步幅值广播到每个轴的元组并更快地获取这些值！

numpy索引解释 ndarray[(4, 2), (5, 3)]

Explanation of numpy indexing ndarray[(4, 2), (5, 3)]

python

numpy

matrix-indexing

问题

背景

已更新