numpy 数组字典真的比 ndarray 使用更少的内存吗？

Question

我正在尝试寻找一种内存高效的方法来将数据存储在 python 变量中以便快速访问和分析。我在 numpy 中初始化一个 2d 数组，然后通过以下方式找到它的内存使用情况（使用 sys 以便稍后与其他变量类型进行比较）：

a = np.zeros((1000,1000), dtype=np.float32)
print('The size of the numpy array is {} bytes'.format(sys.getsizeof(a)))

哪个returns：The size of the numpy array is 4000112 bytes

我可以使用以下 for 循环将其移动到 1d numpy 数组的字典中：

b = {}
for ii in range(1000):
    b[f'{ii}']=a[:,ii]

print('The size of the dictionary is {} bytes'.format(sys.getsizeof(b)))

其中 returns：The size of the dictionary is 36968 bytes。即使我删除 a 和运行垃圾收集，字典大小仍然存在，因此 b 不能只是指向 a.

的容器

为什么一维数组的字典比 ndarray 中的相同数组占用更少的内存？

Answer 1

你的观察有两个根本性的错误。

证明尺寸大致相同：

b = {}
for ii in range(1000):
    b[f'{ii}']=a[:,ii].copy()
sum(sys.getsizeof(e) for e in b.values())
# 4096000

Does a dictionary of numpy arrays really use less memory than an ndarray?