Tensorflow分配内存：38535168的分配超过系统内存的10%

Question

我正在尝试使用 ResNet50 预训练权重构建分类器。代码库完全在 Keras 高级 Tensorflow API 中实现。完整代码贴在下面GitHubLink.

源代码：Classification Using RestNet50 Architecture

预训练模型文件大小94.7mb.

我加载了预训练文件

new_model = Sequential()

new_model.add(ResNet50(include_top=False,
                pooling='avg',
                weights=resnet_weight_paths))

并拟合模型

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 12,
    class_mode = 'categorical'
    )

validation_generator = data_generator.flow_from_directory(
    'path_to_the_validation_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    class_mode = 'categorical'
    )

#compile the model

new_model.fit_generator(
    train_generator,
    steps_per_epoch = 3,
    validation_data = validation_generator,
    validation_steps = 1
)

在训练数据集中，我有两个文件夹 dog 和 cat，每个文件夹包含近 10,000 张图像。当我编译脚本时，出现以下错误

Epoch 1/1 2018-05-12 13:04:45.847298: W tensorflow/core/framework/allocator.cc:101] Allocation of 38535168 exceeds 10% of system memory. 2018-05-12 13:04:46.845021: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:47.552176: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.199240: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.918930: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:49.274137: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:49.647061: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.028839: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.413735: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory.

有什么想法可以优化加载预训练模型的方式（或）消除此警告消息吗？

谢谢！

Answer 1

尝试将 batch_size 属性减少到一个小数字（例如 1,2 或 3）。示例：

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 2,
    class_mode = 'categorical'
    )

Answer 2

或者，您可以设置环境变量 TF_CPP_MIN_LOG_LEVEL=2 以过滤掉信息和警告消息。我发现 this github issue where they complain about the same output. To do so within python, you can use the solution from here:

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

你甚至可以随意打开和关闭它。我在运行我的代码之前测试了最大可能的批处理大小，并且我可以在执行此操作时禁用警告和错误。

Answer 3

我是运行 CPU 上的小模特，遇到了同样的问题。添加：os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 已解决。

Answer 4

我在运行使用 Docker 和 Jupyter notebook 使用 Tensorflow 容器时遇到了同样的问题。我能够通过增加容器内存来解决这个问题。

在 Mac OS，您可以轻松地从以下位置执行此操作：

       Docker Icon > Preferences >  Advanced > Memory

将滚动条拖动到最大（例如 4GB）。应用，它将重新启动 Docker 引擎。

现在再次运行您的张量流容器。

在单独的终端中使用 docker stats 命令很方便实时显示容器内存使用情况，可以看到内存消耗在增长：

CONTAINER ID   NAME   CPU %   MEM USAGE / LIMIT     MEM %    NET I/O             BLOCK I/O           PIDS
3170c0b402cc   mytf   0.04%   588.6MiB / 3.855GiB   14.91%   13.1MB / 3.06MB     214MB / 3.13MB      21

Answer 5

我遇到了同样的问题，我的结论是看到这个错误有两个因素需要考虑： 1- batch_size ==> 因为这负责每个时期要处理的数据大小 2- image_size ==> 图像维度（图像大小）越高，要处理的数据越多

所以对于这两个因素，RAM 无法处理所有需要的数据。

为了解决问题我尝试了两种情况：第一个改变 batch_size 表格 32 为 3 或 2 第二个减少 image_size 从 (608,608) 到 (416,416)

Answer 6

我遇到了同样的错误，我尝试设置 os.environment 标志...但没有成功。

然后我继续将批量大小从 16 减少到 8，然后它开始正常工作。因为，train 方法考虑了 batch size...我觉得，减小图像尺寸也可以...如上所述。

Answer 7

我在 Linux 平台上运行编码时遇到了同样的问题。我将之前设置为 0 的交换内存大小更改为 1 GB，问题得到解决。

更多详细信息，您可以参考此link。

https://linuxize.com/post/create-a-linux-swap-file/

Answer 8

我遇到过类似的问题，但对我来说减小 batch 大小并没有解决问题（当批处理值为 16 时，我的代码实际上运行没问题）。当我遇到内存问题时，我开始研究它下的一个模型，该模型涉及扫描图片。在模型中，有一个 Dense 值，来自我最初设置为 4096 的 Keras 包。我将该值除以 2，整个代码停止显示内存错误。此外，降低图像分辨率的值也可以解决问题，但我的照片已经是低分辨率 (100X100)

Tensorflow分配内存：38535168的分配超过系统内存的10%

Tensorflow Allocation Memory: Allocation of 38535168 exceeds 10% of system memory

python

memory

tensorflow

resnet

keras-layer