在 Jupyter 笔记本中使用 GPU 训练 CNN-LSTM 内存不足

Out of memory training CNN-LSTM with GPU in Jupyter notebook

目前,我想编译我的混合 CNN-LSTM 模型用于情感分析,但出现以下错误

OOM when allocating tensor with shape[9051,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

这是我的 GPU 列表,我想在其中使用 RTX:

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 13057500645716466504,
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 44957696
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 6095838710984840352
 physical_device_desc: "device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:05:00.0, compute capability: 8.6",
 name: "/device:GPU:1"
 device_type: "GPU"
 memory_limit: 10648354816
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 10826802477734196135
 physical_device_desc: "device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1"]

这是我的代码:

# Build hybrid CNN-LSTM model
def build_cnn_lstm_model(num_words, embedding_vector_size, embedding_matrix, max_sequence_length):
    # Input layer
    input_layer = Input(shape=(max_sequence_length,))

    # Word embedding
    embedding_layer = Embedding(input_dim=num_words,
                              output_dim=embedding_vector_size,
                              weights=[embedding_matrix],
                              input_length=max_sequence_length)(input_layer)

    # CNN model
    # Bigrams extraction
    bigrams_convolution_layer = Conv1D(filters=256,
                                     kernel_size=2,
                                     strides=1,
                                     padding='valid',
                                     activation='relu')(embedding_layer)
    bigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                           strides=1,
                                           padding='valid')(bigrams_convolution_layer)

    # Trigrams extraction
    trigrams_convolution_layer = Conv1D(filters=256,
                                     kernel_size=3,
                                     strides=1,
                                     padding='valid',
                                     activation='relu')(bigrams_max_pooling_layer)
    trigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                           strides=1,
                                           padding='valid')(trigrams_convolution_layer)

    # Fourgrams extraction
    fourgrams_convolution_layer = Conv1D(filters=256,
                                      kernel_size=4,
                                      strides=1,
                                      padding='valid',
                                      activation='relu')(trigrams_max_pooling_layer)
    fourgrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                            strides=1,
                                            padding='valid')(fourgrams_convolution_layer)

    # Fivegrams extraction
    fivegrams_convolution_layer = Conv1D(filters=256,
                                      kernel_size=5,
                                      strides=1,
                                      padding='valid',
                                      activation='relu')(fourgrams_max_pooling_layer)
    fivegrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                            strides=1,
                                            padding='valid')(fivegrams_convolution_layer)

    # Dropout layer
    dropout_layer = Dropout(rate=0.5)(bigrams_max_pooling_layer)

    # LSTM model
    lstm_layer = LSTM(units=128,
                      activation='tanh',
                      return_sequences=False,
                      dropout=0.3,
                      return_state=False)(dropout_layer)

    # Batch normalization layer
    batch_norm_layer = BatchNormalization()(lstm_layer)

    # Classifier model
    dense_layer = Dense(units=10, activation='relu') (lstm_layer)
    output_layer = Dense(units=3, activation='softmax')(dense_layer)

    cnn_lstm_model = Model(inputs=input_layer, outputs=output_layer)

    return cnn_lstm_model

with tf.device('/device:GPU:0'):
    sinovac_cnn_lstm_model = build_cnn_lstm_model(SINOVAC_NUM_WORDS, 
                                                  SINOVAC_EMBEDDING_VECTOR_SIZE,
                                                  SINOVAC_EMBEDDING_MATRIX,
                                                  SINOVAC_MAX_SEQUENCE)
    sinovac_cnn_lstm_model.summary()

    sinovac_cnn_lstm_model.compile(loss='categorical_crossentropy',
                                   optimizer=Adam(lr=0.001),
                                   metrics=['accuracy'])

奇怪的是,我用的是 GTX 的 GPU:1,它工作正常 GTX 1080Ti 显存比RTX A6000 显存少,但为什么用RTX A6000 编译训练时出现Out Of Memory 错误? 有什么解决办法吗?

尽管 physical_device_desc 将其称为 device: 0,但使用的是 name: "/device:GPU:1" 条目下的名称。因此,即使 1080Ti 在 physical_device_desc 字段中称自己为 device: 1,它实际上是 `"/device:GPU:0".

也就是说,用with tf.device('/device:GPU:0'):用1080Ti,with tf.device('/device:GPU:1'):用A6000

这听起来可能很脆弱,但我刚刚浏览了 Tensorflow 文档,似乎没有 built-in 函数可以通过型号名称识别 GPU。因此,您需要 运行 通过设备列表,并匹配该物理设备名称(或简单地找到内存最多的那个)以获得您需要的“GPU:nnn”名称。