numpy ndarray 缓冲区对于请求的数组来说太小

Question

我正在尝试使用结构化 numpy 数组设置共享内存缓冲区。

如果我只使用 (datetime's, int's, float's) 或 (string's, int's, float's) 我没有问题。但是，如果我尝试使用 (string's, datetime's, int's, float's)，我会运行陷入 'TypeError: buffer is too small for requested array' 错误。

绞尽脑汁想知道为什么这不起作用。感谢任何帮助。

有效：

import numpy as np
from datetime import datetime

N_list_size = 100_000

a = [
     (datetime.now(), 
      np.uint64(1234), 
      np.float64("123.4"))
] * N_list_size

np_array = np.ndarray(shape=(N_list_size,),
                      buffer=np.array(a),
                      dtype=[
                              ('a', np.datetime64),
                              ('b', np.uint64),
                              ('c', np.float64),
                      ])

shape, dtype = np_array.shape, np_array.dtype
print(f"np_array's size = {np_array.nbytes / 1e6}MB")
print(f"np_array's dtype = {dtype}")

这也有效：

import numpy as np
from datetime import datetime

N_list_size = 100_000

a = [
     ("d50ec984-77a8-460a-b958-66f114b0de9b", 
      np.uint64(1234), 
      np.float64("123.4"))
] * N_list_size

np_array = np.ndarray(shape=(N_list_size,),
                      buffer=np.array(a),
                      dtype=[
                              ('a', np.str_, 36),
                              ('b', np.uint64),
                              ('c', np.float64),
                      ])

shape, dtype = np_array.shape, np_array.dtype
print(f"np_array's size = {np_array.nbytes / 1e6}MB")
print(f"np_array's dtype = {dtype}")

这不起作用：

import numpy as np
from datetime import datetime

N_list_size = 100_000

a = [
     ("d50ec984-77a8-460a-b958-66f114b0de9b", 
      datetime.now(),
      np.uint64(1234), 
      np.float64("123.4"))
] * N_list_size

np_array = np.ndarray(shape=(N_list_size,),
                      buffer=np.array(a),
                      dtype=[
                              ('a', np.str_, 36),
                              ('b', np.datetime64),
                              ('c', np.uint64),
                              ('d', np.float64),
                      ])

shape, dtype = np_array.shape, np_array.dtype
print(f"np_array's size = {np_array.nbytes / 1e6}MB")
print(f"np_array's dtype = {dtype}")

并失败：

TypeError: buffer is too small for requested array

为什么同时使用日期时间和字符串会导致此处出现问题？

如何解决这个问题？

Answer 1

本着让事情正常进行的精神，并根据@Kevin 的评论，以下日期时间的简单固定长度字符串表示（例如 datetime.isoformat()）肯定有效：

import numpy as np
from datetime import datetime

N_list_size = 100_000

a = [
     ("d50ec984-77a8-460a-b958-66f114b0de9b", 
      datetime.now().isoformat(),
      np.uint64(1234), 
      np.float64("123.4"))
] * N_list_size

dtype=[
    ('a', 'U36'),
    ('b', 'U25'),
    ('c', np.uint64),
    ('d', np.float64),
]

np_array = np.ndarray(shape=(N_list_size,),
                      buffer=np.array(a,dtype=dtype),
                      dtype=dtype)

shape, dtype = np_array.shape, np_array.dtype
print(f"np_array's size = {np_array.nbytes / 1e6}MB")
print(f"np_array's dtype = {dtype}")

输出：

np_array's size = 28.4MB
np_array's dtype = [('a', '<U36'), ('b', '<U25'), ('c', '<u8'), ('d', '<f8')]

numpy ndarray 缓冲区对于请求的数组来说太小

numpy ndarray buffer is too small for requested array

python

numpy

shared-memory

numpy-ndarray