如何防止我的 Colab notebook 在标准化我的图像时崩溃？

Question

我正在尝试制作一个能够识别人类情绪的模型。我的代码和 RAM 一开始就很好：

但是当我尝试标准化我的图像时，RAM 急剧上升

然后 Colab 就崩溃了：

这是导致 colab 崩溃的代码块：

import os
import matplotlib.pyplot as plt
import cv2

data = []

for emot in os.listdir('./data/'):
    for file_ in os.listdir(f'./data/{emot}'):
        img = cv2.imread(f'./data/{emot}/{file_}', 0)
        img = cv2.bitwise_not(img)
        img /= 255.0 # <--- This is the line that causes colab to crash
        data.append([img, emotions.index(emot)])

如果我删除 img /= 255.0，它不会崩溃，但是我有未规范化的图像！：
我什至尝试在另一个块中对其进行标准化：

for i in range(len(data)):
    data[i][0] = np.array(data[i][0]) / 255.0

但它不起作用并且仍然崩溃

Answer 1

假设管道的下一步是从图像语料库中创建一个 tf.data.Dataset 对象，您可以使用 Dataset.map() 将预处理移至数据加载管道以节省内存 space。 Tensorflow 在这里有一个关于如何执行此操作的非常有据可查的指南 -> https://www.tensorflow.org/guide/data#preprocessing_data

Answer 2

我想举个例子。首先让我们看一下下面的代码。

import numpy as np
x = np.random.randint(0, 255, size=(100, 32, 32), dtype=np.int16)

print('Present data type', x.dtype)
# What you did
y = x/255
print('Present data type', y.dtype)
# What you should do
z = (x/255).astype(np.float16)
print('Present data type', z.dtype)

输出：

Present data type int16
Present data type float64
Present data type float16

如果你仔细观察，当我划分 x 变量并声明 y=x/255 时，数据类型变为 float64。如果划分 NumPy 数组的 int 数据类型，默认情况下，它会被类型转换为 float64。通常，'float64' 包含更大的内存。因此，在划分 int 类型的 NumPy 矩阵时，对于较大的数据集，应该始终将类型转换为较短的数据类型。

如果你执行的代码在没有img /= 255.0块的情况下运行流畅，那么就是这种情况。除法后，您应该将 img 变量类型转换为尽可能低的 float 类型，例如 np.float16 或 np.float32。但是，np.float16有一些限制，TensorFlow不完全支持（TF将其转换为32位浮点数），您可以使用np.float32数据类型。

因此，尝试在行 img /= 255.0.

之后添加 img.astype(np.float16) 或 img.astype(np.float32)

给出修改后的代码版本，

import os
import matplotlib.pyplot as plt
import cv2

data = []

for emot in os.listdir('./data/'):
    for file_ in os.listdir(f'./data/{emot}'):
        img = cv2.imread(f'./data/{emot}/{file_}', 0)
        img = cv2.bitwise_not(img)
        img = (img/255.0).astype(np.float16) # <--- This is the suggestion
        data.append([img, emotions.index(emot)])

如何防止我的 Colab notebook 在标准化我的图像时崩溃？

How can I stop my Colab notebook from crashing while normalising my images?

python

crash

machine-learning

google-colaboratory