tf.keras OOM 即使在批量大小为 1 的小型 LSTM 模型上也是如此

Question

我在从 TensorFlow 1.5 迁移到 TensorFlow 2.0 时遇到了这个错误。我想特别声明此模型 在 1.5 上正确运行。唯一改变的是从生成器（顺便说一句，批量大小为 8）迁移到 tf.Dataset，同时喂养 .fit() .

我在 Stack Overflow 上研究了很多关于 GPU 上的 OOM 问题的线程，然而，其中大部分是关于真正巨大的张量的问题，而我的是小 [256,128] 或大批量尺码。

这是我的模型：

def build_model(self):
    self.g_Model = Sequential()
    self.g_Model.add(Embedding(input_dim=self.g_Max_features, output_dim=256, name='X'))
    self.g_Model.add(LSTM(128))
    self.g_Model.add(Dropout(0.5))
    self.g_Model.add(Dense(1, activation='sigmoid'))
    self.g_Model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

总结：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
X (Embedding)                (None, None, 256)         256000    
_________________________________________________________________
lstm (LSTM)                  (None, 128)               197120    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
=================================================================
Total params: 453,249
Trainable params: 453,249
Non-trainable params: 0

这是我的训练函数：

def train_model(self):
    if self.g_Model is None:
        self.build_model()

    dataset = self.prepare_the_data()
    self.g_Model.fit(dataset, epochs=2)

以及数据本身的准备：

@staticmethod
def prepare_the_data():
    lstm_feature_description = {
        'X_input': tf.io.FixedLenFeature(CONFIG.g_keras_lstm_max_document_length, tf.float32),
        'y': tf.io.FixedLenFeature((), tf.int64),
    }

    def _parse_lstm_function(example_proto):
        # Parse the input tf.Example proto using the dictionary above.
        parsed = tf.io.parse_single_example(serialized=example_proto, features=lstm_feature_description)
        return parsed["X_input"], parsed["y"]

    # Start Preparing The Data
    dataset = tf.data.TFRecordDataset(CONFIG.g_record_file_lstm)
    dataset = dataset.shuffle(buffer_size=5000)
    dataset = dataset.map(map_func=_parse_lstm_function)
    dataset = dataset.batch(batch_size=1)

    for next_element in dataset:
        tf.print(next_element)

    return dataset

数据集包含 40 个元素。这是其中之一的样子：

([[0 0 0 ... 1 10 3]], [0])

X_input 是 tensorflow.python.framework.ops.EagerTensor 大小为 24000 且 y 类型相同，但大小为 1（只是一个标签）。

因此，当运行 .fit() 时，我收到以下 OOM 错误（第 1 部分）：

2019-11-02 18:42:52.426444: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 128.0KiB (rounded to 131072).  Current allocation summary follows.
2019-11-02 18:42:52.428463: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256):   Total Chunks: 2753, Chunks in use: 2753. 688.3KiB allocated for chunks. 688.3KiB in use in bin. 10.8KiB client-requested in use in bin.
2019-11-02 18:42:52.428723: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512):   Total Chunks: 78217, Chunks in use: 78217. 38.19MiB allocated for chunks. 38.19MiB in use in bin. 38.19MiB client-requested in use in bin.
2019-11-02 18:42:52.428982: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024):  Total Chunks: 24001, Chunks in use: 24001. 23.44MiB allocated for chunks. 23.44MiB in use in bin. 23.44MiB client-requested in use in bin.
2019-11-02 18:42:52.429247: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048):  Total Chunks: 3, Chunks in use: 3. 6.0KiB allocated for chunks. 6.0KiB in use in bin. 6.0KiB client-requested in use in bin.
2019-11-02 18:42:52.429481: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429704: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429920: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430138: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430359: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536):     Total Chunks: 10892, Chunks in use: 10892. 680.75MiB allocated for chunks. 680.75MiB in use in bin. 680.75MiB client-requested in use in bin.
2019-11-02 18:42:52.430613: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072):    Total Chunks: 10894, Chunks in use: 10894. 1.33GiB allocated for chunks. 1.33GiB in use in bin. 1.33GiB client-requested in use in bin.
2019-11-02 18:42:52.430855: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144):    Total Chunks: 3, Chunks in use: 3. 1022.8KiB allocated for chunks. 1022.8KiB in use in bin. 768.0KiB client-requested in use in bin.
2019-11-02 18:42:52.431091: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288):    Total Chunks: 3, Chunks in use: 3. 2.00MiB allocated for chunks. 2.00MiB in use in bin. 1.50MiB client-requested in use in bin.
2019-11-02 18:42:52.431323: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431539: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431755: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431970: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432193: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432419: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.442986: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443324: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443543: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443767: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 128.0KiB was 128.0KiB, Chunk State: 
2019-11-02 18:42:52.443895: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576
2019-11-02 18:42:52.444010: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600000 next 1 of size 1280
2019-11-02 18:42:52.444139: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600500 next 9 of size 256
2019-11-02 18:42:52.444267: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600600 next 13 of size 256
...

第 2 部分：

2019-11-02 18:44:43.211483: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 525056 totalling 512.8KiB
2019-11-02 18:44:43.211607: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1047808 totalling 1023.3KiB
2019-11-02 18:44:43.211731: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 2.06GiB
2019-11-02 18:44:43.211851: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 2210712576 memory_limit_: 2210712780 available bytes: 204 curr_region_allocation_bytes_: 4294967296
2019-11-02 18:44:43.212060: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  2210712780
InUse:                  2210712576
MaxInUse:               2210712576
NumAllocs:                  137751
MaxAllocSize:             33554432

2019-11-02 18:44:43.216115: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ****************************************************************************************************
2019-11-02 18:44:43.216331: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at split_op.cc:311 : Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-11-02 18:44:43.216642: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Reshape_12/_28]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

2019-11-02 18:44:43.223629: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

我已经尝试过但没有成功的方法：

我设置了 set_memory_growth=True
移动了 train 函数中的所有代码，除了构建模型和 .fit() 本身
将批量大小降低到 1。

我真的不明白这是怎么回事，因为我的模型很小，批量大小只有 1。我使用的是 GTX1060 3GB。因此，非常感谢任何帮助。谢谢！

Answer 1

你不会相信我的错误是多么愚蠢。在@OverLordGoldDragon 发布的不同问答后，我只能幸运地识别它。

在导入阶段，我使用了以下语句：

from tensorflow_core.python.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow_core.python.keras.models import Sequential, load_model
from tensorflow_core.python.keras.preprocessing import sequence

相反，我应该使用这些：

from tensorflow.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.preprocessing import sequence

顺便说一句，最新的 PyCharm Professional 不为 tf.keras 语句提供自动完成功能，这首先让我失望了。出乎意料的是，tf.python.keras 自动完成功能正常工作。

可以在此处找到更多信息：

tf.keras OOM 即使在批量大小为 1 的小型 LSTM 模型上也是如此

tf.keras OOM even on a small LSTM model with a batch size of 1

python

keras

tensorflow

tensorflow-datasets

tensorflow2.0