在 Python 中加载数据集时内核死机：为什么会发生？

Question

我正在尝试将 Python 中包含 7000 个 .mat 文件的数据集作为 7000 维张量加载，每个条目的形状为 100 x 100 x 100 x 3。整个数据集小于 80 MB .我正在使用 Spyder。代码如下

dataDir = "/Users/..."
data= []
x_train = np.empty([7165, 100*100*100*3])
x_train = x_train.reshape([7165, 100, 100, 100, 3])

i = 0;

for file in sorted_alphanumeric(os.listdir( dataDir )):
    data = scipy.io.loadmat( dataDir+file )   #store data as LIST
    x_train[i] = np.array(data['tensor'])
    i = i + 1

然而，在读取了大约 2300 行后，内核死机并且程序停止运行。为什么内核会死？如何存储数据集？在我看来，数据集并没有那么大，控制台中的“内存”键总是在 76% 左右。

Answer 1

7000x100x100x100 =7,000,000,000 即使在位中也无法处理太多内存 7 000 000 000 * 3 位 = 2.62500 GB

Answer 2

不要一次加载整个数据集，以免运行内存不足。即使使用 Google Colab 等在线工具并将数据集分成多个部分，也无法避免这种情况。
处理大数据集的方法是通过批量训练（即通过一次加载一批数据集来训练模型）。

在 Python 中加载数据集时内核死机：为什么会发生？

Kernel died when loading a dataset in Python: why does it happen?

python

dataset

scipy

spyder