Cython - 动态二维 C++ 数组的内存视图
Cython - Memoryview of a dynamic 2D C++Array
目标: 使用 Cython 从 2D C++ 字符数组获取内存视图。
一点背景知识:
我有一个本机 C++ 库,它生成一些数据并 returns 通过 char**
到 Cython 世界。数组在库中的初始化和操作大概是这样的:
struct Result_buffer{
char** data_pointer;
int length = 0;
Result_buffer( int row_capacity) {
data_pointer; = new char*[row_capacity];
return arr;
}
// the actual data is appended row by row
void append_row(char* row_data) {
data_pointer[length] = row_data;
length++;
}
}
所以我们基本上得到了一个嵌套子数组的数组。
旁注:
- 每行有相同的列数
- 行可以共享内存,即指向相同的 row_data
目标是将此数组与内存视图一起使用,最好不要进行昂贵的内存复制。
第一种方法(无效):
使用 Cython 数组和内存视图:
这是应该使用生成的数据的 .pyx 文件
from cython cimport view
cimport numpy as np
import numpy as np
[...]
def raw_data_to_numpy(self):
# Dimensions of the source array
cdef int ROWS = self._row_count
cdef int COLS = self._col_count
# This is the array from the C++ library and is created by 'create_buffer()'
cdef char** raw_data_pointer = self._raw_data
# It only works with a pointer to the first nested array
cdef char* pointer_to_0 = raw_data_pointer[0]
# Now create a 2D Cython array
cdef view.array cy_array = <char[:ROWS, :COLS]> pointer_to_0
# With this we can finally create our NumPy array:
return np.asarray(cy_array)
这实际上编译得很好并且运行时没有崩溃,但结果并不完全符合我的预期。如果我打印出 NumPy 数组的值,我会得到:
000: [1, 2, 3, 4, 5, 6, 7, 8, 9]
001: [1, 0, 0, 0, 0, 0, 0, 113, 6]
002: [32, 32, 32, 32, 96, 96, 91, 91, 97]
[...]
事实证明第一行映射正确,但其他行看起来更像是未初始化的内存。所以可能与 char**
的内存布局和 2D 内存视图的默认模式不匹配。
编辑 #1:我从 is that the built-in cython arrays don't support indirect memory layouts so I have to create a cython-wrapper for the unsigned char**
which exposes the buffer-protocol
中学到了什么
解决方案:
手动实现缓冲区协议:
包装 class 包装 unsigned char**
并实现缓冲区协议 (Indirect2DArray.pyx):
cdef class Indirect2DArray:
cdef Py_ssize_t len
cdef unsigned char** raw_data
cdef ndim
cdef Py_ssize_t item_size
cdef Py_ssize_t strides[2]
cdef Py_ssize_t shape[2]
cdef Py_ssize_t suboffsets[2]
def __cinit__(self,int nrows,int ncols):
self.ndim = 2
self.len = nrows * ncols
self.item_size = sizeof(unsigned char)
self.shape[0] = nrows
self.shape[1] = ncols
self.strides[0] = sizeof(void*)
self.strides[1] = sizeof(unsigned char)
self.suboffsets[0] = 0
self.suboffsets[1] = -1
cdef set_raw_data(self, unsigned char** raw_data):
self.raw_data = raw_data
def __getbuffer__(self,Py_buffer * buffer, int flags):
if self.raw_data is NULL:
raise Exception("raw_data was NULL when calling __getbuffer__ Use set_raw_data(...) before the buffer is requested!")
buffer.buf = <void*> self.raw_data
buffer.obj = self
buffer.ndim = self.ndim
buffer.len = self.len
buffer.itemsize = self.item_size
buffer.shape = self.shape
buffer.strides = self.strides
buffer.suboffsets = self.suboffsets
buffer.format = "B" # unsigbed bytes
def __releasebuffer__(self, Py_buffer * buffer):
print("CALL TO __releasebuffer__")
注意:我无法通过包装器的构造函数传递原始指针,所以我不得不使用单独的 cdef 函数来设置指针
这是它的用法:
def test_wrapper(self):
cdef nrows= 10000
cdef ncols = 81
cdef unsigned char** raw_pointer = self.raw_data
wrapper = Indirect2DArray(nrows,ncols)
wrapper.set_raw_data(raw_pointer)
# now create the memoryview:
cdef unsigned char[::view.indirect_contiguous, ::1] view = wrapper
# print some slices
print(list(view[0,0:30]))
print(list(view[1,0:30]))
print(list(view[2,0:30]))
生成以下输出:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 2, 1, 4]
[2, 1, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 4]
[3, 1, 2, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 3]
这正是我所期望的。感谢所有帮助过我的人
目标: 使用 Cython 从 2D C++ 字符数组获取内存视图。
一点背景知识:
我有一个本机 C++ 库,它生成一些数据并 returns 通过 char**
到 Cython 世界。数组在库中的初始化和操作大概是这样的:
struct Result_buffer{
char** data_pointer;
int length = 0;
Result_buffer( int row_capacity) {
data_pointer; = new char*[row_capacity];
return arr;
}
// the actual data is appended row by row
void append_row(char* row_data) {
data_pointer[length] = row_data;
length++;
}
}
所以我们基本上得到了一个嵌套子数组的数组。
旁注:
- 每行有相同的列数
- 行可以共享内存,即指向相同的 row_data
目标是将此数组与内存视图一起使用,最好不要进行昂贵的内存复制。
第一种方法(无效):
使用 Cython 数组和内存视图:
这是应该使用生成的数据的 .pyx 文件
from cython cimport view
cimport numpy as np
import numpy as np
[...]
def raw_data_to_numpy(self):
# Dimensions of the source array
cdef int ROWS = self._row_count
cdef int COLS = self._col_count
# This is the array from the C++ library and is created by 'create_buffer()'
cdef char** raw_data_pointer = self._raw_data
# It only works with a pointer to the first nested array
cdef char* pointer_to_0 = raw_data_pointer[0]
# Now create a 2D Cython array
cdef view.array cy_array = <char[:ROWS, :COLS]> pointer_to_0
# With this we can finally create our NumPy array:
return np.asarray(cy_array)
这实际上编译得很好并且运行时没有崩溃,但结果并不完全符合我的预期。如果我打印出 NumPy 数组的值,我会得到:
000: [1, 2, 3, 4, 5, 6, 7, 8, 9]
001: [1, 0, 0, 0, 0, 0, 0, 113, 6]
002: [32, 32, 32, 32, 96, 96, 91, 91, 97]
[...]
事实证明第一行映射正确,但其他行看起来更像是未初始化的内存。所以可能与 char**
的内存布局和 2D 内存视图的默认模式不匹配。
编辑 #1:我从 unsigned char**
which exposes the buffer-protocol
解决方案:
手动实现缓冲区协议:
包装 class 包装 unsigned char**
并实现缓冲区协议 (Indirect2DArray.pyx):
cdef class Indirect2DArray:
cdef Py_ssize_t len
cdef unsigned char** raw_data
cdef ndim
cdef Py_ssize_t item_size
cdef Py_ssize_t strides[2]
cdef Py_ssize_t shape[2]
cdef Py_ssize_t suboffsets[2]
def __cinit__(self,int nrows,int ncols):
self.ndim = 2
self.len = nrows * ncols
self.item_size = sizeof(unsigned char)
self.shape[0] = nrows
self.shape[1] = ncols
self.strides[0] = sizeof(void*)
self.strides[1] = sizeof(unsigned char)
self.suboffsets[0] = 0
self.suboffsets[1] = -1
cdef set_raw_data(self, unsigned char** raw_data):
self.raw_data = raw_data
def __getbuffer__(self,Py_buffer * buffer, int flags):
if self.raw_data is NULL:
raise Exception("raw_data was NULL when calling __getbuffer__ Use set_raw_data(...) before the buffer is requested!")
buffer.buf = <void*> self.raw_data
buffer.obj = self
buffer.ndim = self.ndim
buffer.len = self.len
buffer.itemsize = self.item_size
buffer.shape = self.shape
buffer.strides = self.strides
buffer.suboffsets = self.suboffsets
buffer.format = "B" # unsigbed bytes
def __releasebuffer__(self, Py_buffer * buffer):
print("CALL TO __releasebuffer__")
注意:我无法通过包装器的构造函数传递原始指针,所以我不得不使用单独的 cdef 函数来设置指针
这是它的用法:
def test_wrapper(self):
cdef nrows= 10000
cdef ncols = 81
cdef unsigned char** raw_pointer = self.raw_data
wrapper = Indirect2DArray(nrows,ncols)
wrapper.set_raw_data(raw_pointer)
# now create the memoryview:
cdef unsigned char[::view.indirect_contiguous, ::1] view = wrapper
# print some slices
print(list(view[0,0:30]))
print(list(view[1,0:30]))
print(list(view[2,0:30]))
生成以下输出:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 2, 1, 4]
[2, 1, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 4]
[3, 1, 2, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 3]
这正是我所期望的。感谢所有帮助过我的人