当 TensorFlow 说 "there could be performance gains if more memory is available" 时如何处理批量大小

What to do with batch size when TensorFlow says "there could be performance gains if more memory is available"

我在 Nvidia K80 上尝试增加 CNN 模型的 batch_size 时遇到以下错误：

2017-08-07 20:33:38.573318: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.04GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

我想知道遇到这种情况最快的选择是什么：

尽管有警告，什么也不做，让模型继续训练。
按一个因子缩小批量大小和学习率，但按相同因子进行更多次迭代，以便以较小的批量大小和更多次迭代给出相同的结果（遵循 线性缩放this paper).

中讨论的规则

如果您继续训练（上面的第一个项目符号），您将遇到一些效率损失，因为 OS 交换您的数据（抖动）。选项 2 是正确的（根据我的经验）：减少批量大小以有效利用可用内存，并增加迭代次数以进行补偿。

关键训练量以 epoch 为单位，而不是迭代次数。如果将批量大小减少 2 倍，那么迭代次数将增加 2 倍，您将获得几乎相同的结果。

当 TensorFlow 说 "there could be performance gains if more memory is available" 时如何处理批量大小

What to do with batch size when TensorFlow says "there could be performance gains if more memory is available"

python

performance

gpu

tensorflow

tensorflow-gpu