Numpy 数组：获取原始字节而不复制

Question

我正在尝试将多个 Numpy 数组的字节连接成一个 bytearray 以在 HTTP post 请求中发送它。

我能想到的最有效的方法是创建一个足够大的 bytearray 对象，然后将所有 numpy 数组中的字节连续写入其中。

代码看起来像这样：

list_arr = [np.array([1, 2, 3]), np.array([4, 5, 6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
cb = bytearray(total_nb_bytes)

# Too Lazy Didn't do: generate list of delimiters and information to decode the concatenated bytes array

# concatenate the bytes
for arr in list_arr:
    _bytes = arr.tobytes()
    cb.extend(_bytes)

方法 tobytes() 不是零拷贝方法。它会将 numpy 数组的原始数据复制到 bytes 对象中。

在python中，缓冲区允许访问内部原始数据值（这在C 级别称为协议缓冲区）Python documentation; numpy had this possibility in numpy1.13, the method was called getbuffer() link。然而，此方法已被弃用！

正确的做法是什么？

Answer 1

只需使用arr.data。 returns 一个 memoryview 对象，它引用数组的内存而不复制。它可以被索引和切片（创建新的内存视图而不复制）并附加到字节数组（只复制一次到字节数组）。

Answer 2

您可以从消息 bytearray and write to that efficiently using np.concatenate 的 out 参数中创建一个与 numpy 兼容的缓冲区。

list_arr = [np.array([1,2,3]), np.array([4,5,6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
total_size = sum(a.size for a in list_arr)
cb = bytearray(total_nb_bytes)

np.concatenate(list_arr, out=np.ndarray(total_size, dtype=list_arr[0].dtype, buffer=cb))

果然，

>>> cb
bytearray(b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00')

此方法意味着您的输出都是相同的格式。要解决此问题，请将原始数组查看为 np.uint8:

np.concatenate([a.view(np.uint8) for a in list_arr],
               out=np.ndarray(total_nb_bytes, dtype=list_arr[0].dtype, buffer=cb))

这样，您也不需要计算 total_size，因为您已经计算了字节数。

这种方法可能比遍历数组列表更有效。你是对的，缓冲协议是你获得解决方案的门票。您可以使用低级 np.ndarray 构造函数创建一个数组对象，该数组对象包裹在支持缓冲区协议的任何对象的内存中。从那里，您可以使用所有常用的 numpy 函数与缓冲区交互。

Numpy 数组：获取原始字节而不复制

Numpy array: get the raw bytes without copying

python

arrays

numpy

zero-copy

python-bytearray