转置 numpy 数组对其步幅和数据缓冲区的影响

Question

假设给你一个 numpy 数组

x = np.array([[1,2],[3,4]], dtype=np.int8)

让我们进行转置。

y = x.T

我对 numpy 文档的理解是，转置仅修改数组的步幅，而不修改其底层数据缓冲区。

我们可以通过运行

验证

>> x.data.strides
(2, 1)

>> y.data.strides
(1, 2)

不过，数据好像也被修改了

>> x.data.tobytes()
b'\x01\x02\x03\x04'

>> y.data.tobytes()
b'\x01\x03\x02\x04'

当根据我的理解，预期的行为应该是 y 的数据缓冲区与 x 的数据缓冲区保持相同，只是步幅发生了变化。

为什么我们看到 y 的数据缓冲区不同？也许 data 属性没有显示底层内存布局？

Answer 1

检查数据缓冲区的更好方法是使用 __array_interface__ 指针：

In [8]: y=x.T
In [9]: x.__array_interface__
Out[9]: 
{'strides': None,
 'data': (144597512, False),
 'shape': (2, 2),
 'version': 3,
 'typestr': '|i1',
 'descr': [('', '|i1')]}
In [10]: y.__array_interface__
Out[10]: 
{'strides': (1, 2),
 'data': (144597512, False),
 'shape': (2, 2),
 'version': 3,
 'typestr': '|i1',
 'descr': [('', '|i1')]}

.data 的文档是：

In [12]: x.data? memoryview(object) Create a new memoryview object which references the given object.

In [13]: x.data
Out[13]: <memory at 0xb2f7cb6c>
In [14]: y.data
Out[14]: <memory at 0xb2f7cbe4>

所以 y.data 没有显示其缓冲区的字节数，而是显示步幅遍历的字节数。我不确定是否有办法查看 y 数据缓冲区。

In [25]: y.base
Out[25]: 
array([[1, 2],
       [3, 4]], dtype=int8)

x是连续的，y是连续的。

Answer 2

作为对@hpaulj 好的回答的补充：

In [7]: frombuffer(x,uint8)
Out[7]: array([1, 2, 3, 4], dtype=uint8)

In [8]: frombuffer(y,uint8) 
ValueError: ndarray is not C-contiguous

In [9]: frombuffer(np.ascontiguousarray(y),uint8)
Out[9]: array([1, 3, 2, 4], dtype=uint8)

说明y确实是一个观点

转置 numpy 数组对其步幅和数据缓冲区的影响

The effect of transposing a numpy array on its strides and data buffer

arrays

buffer

transpose

numpy

stride