在 Jupyter 笔记本中使用 GPU 训练 CNN-LSTM 内存不足
Out of memory training CNN-LSTM with GPU in Jupyter notebook
目前,我想编译我的混合 CNN-LSTM 模型用于情感分析,但出现以下错误
OOM when allocating tensor with shape[9051,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]
这是我的 GPU 列表,我想在其中使用 RTX:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13057500645716466504,
name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 44957696
locality {
bus_id: 1
links {
}
}
incarnation: 6095838710984840352
physical_device_desc: "device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:05:00.0, compute capability: 8.6",
name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 10648354816
locality {
bus_id: 1
links {
}
}
incarnation: 10826802477734196135
physical_device_desc: "device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1"]
这是我的代码:
# Build hybrid CNN-LSTM model
def build_cnn_lstm_model(num_words, embedding_vector_size, embedding_matrix, max_sequence_length):
# Input layer
input_layer = Input(shape=(max_sequence_length,))
# Word embedding
embedding_layer = Embedding(input_dim=num_words,
output_dim=embedding_vector_size,
weights=[embedding_matrix],
input_length=max_sequence_length)(input_layer)
# CNN model
# Bigrams extraction
bigrams_convolution_layer = Conv1D(filters=256,
kernel_size=2,
strides=1,
padding='valid',
activation='relu')(embedding_layer)
bigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(bigrams_convolution_layer)
# Trigrams extraction
trigrams_convolution_layer = Conv1D(filters=256,
kernel_size=3,
strides=1,
padding='valid',
activation='relu')(bigrams_max_pooling_layer)
trigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(trigrams_convolution_layer)
# Fourgrams extraction
fourgrams_convolution_layer = Conv1D(filters=256,
kernel_size=4,
strides=1,
padding='valid',
activation='relu')(trigrams_max_pooling_layer)
fourgrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(fourgrams_convolution_layer)
# Fivegrams extraction
fivegrams_convolution_layer = Conv1D(filters=256,
kernel_size=5,
strides=1,
padding='valid',
activation='relu')(fourgrams_max_pooling_layer)
fivegrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(fivegrams_convolution_layer)
# Dropout layer
dropout_layer = Dropout(rate=0.5)(bigrams_max_pooling_layer)
# LSTM model
lstm_layer = LSTM(units=128,
activation='tanh',
return_sequences=False,
dropout=0.3,
return_state=False)(dropout_layer)
# Batch normalization layer
batch_norm_layer = BatchNormalization()(lstm_layer)
# Classifier model
dense_layer = Dense(units=10, activation='relu') (lstm_layer)
output_layer = Dense(units=3, activation='softmax')(dense_layer)
cnn_lstm_model = Model(inputs=input_layer, outputs=output_layer)
return cnn_lstm_model
with tf.device('/device:GPU:0'):
sinovac_cnn_lstm_model = build_cnn_lstm_model(SINOVAC_NUM_WORDS,
SINOVAC_EMBEDDING_VECTOR_SIZE,
SINOVAC_EMBEDDING_MATRIX,
SINOVAC_MAX_SEQUENCE)
sinovac_cnn_lstm_model.summary()
sinovac_cnn_lstm_model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['accuracy'])
奇怪的是,我用的是 GTX 的 GPU:1,它工作正常
GTX 1080Ti 显存比RTX A6000 显存少,但为什么用RTX A6000 编译训练时出现Out Of Memory 错误?
有什么解决办法吗?
尽管 physical_device_desc
将其称为 device: 0
,但使用的是 name: "/device:GPU:1"
条目下的名称。因此,即使 1080Ti 在 physical_device_desc
字段中称自己为 device: 1
,它实际上是 `"/device:GPU:0".
也就是说,用with tf.device('/device:GPU:0'):
用1080Ti,with tf.device('/device:GPU:1'):
用A6000
这听起来可能很脆弱,但我刚刚浏览了 Tensorflow 文档,似乎没有 built-in 函数可以通过型号名称识别 GPU。因此,您需要 运行 通过设备列表,并匹配该物理设备名称(或简单地找到内存最多的那个)以获得您需要的“GPU:nnn”名称。
目前,我想编译我的混合 CNN-LSTM 模型用于情感分析,但出现以下错误
OOM when allocating tensor with shape[9051,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]
这是我的 GPU 列表,我想在其中使用 RTX:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13057500645716466504,
name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 44957696
locality {
bus_id: 1
links {
}
}
incarnation: 6095838710984840352
physical_device_desc: "device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:05:00.0, compute capability: 8.6",
name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 10648354816
locality {
bus_id: 1
links {
}
}
incarnation: 10826802477734196135
physical_device_desc: "device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1"]
这是我的代码:
# Build hybrid CNN-LSTM model
def build_cnn_lstm_model(num_words, embedding_vector_size, embedding_matrix, max_sequence_length):
# Input layer
input_layer = Input(shape=(max_sequence_length,))
# Word embedding
embedding_layer = Embedding(input_dim=num_words,
output_dim=embedding_vector_size,
weights=[embedding_matrix],
input_length=max_sequence_length)(input_layer)
# CNN model
# Bigrams extraction
bigrams_convolution_layer = Conv1D(filters=256,
kernel_size=2,
strides=1,
padding='valid',
activation='relu')(embedding_layer)
bigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(bigrams_convolution_layer)
# Trigrams extraction
trigrams_convolution_layer = Conv1D(filters=256,
kernel_size=3,
strides=1,
padding='valid',
activation='relu')(bigrams_max_pooling_layer)
trigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(trigrams_convolution_layer)
# Fourgrams extraction
fourgrams_convolution_layer = Conv1D(filters=256,
kernel_size=4,
strides=1,
padding='valid',
activation='relu')(trigrams_max_pooling_layer)
fourgrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(fourgrams_convolution_layer)
# Fivegrams extraction
fivegrams_convolution_layer = Conv1D(filters=256,
kernel_size=5,
strides=1,
padding='valid',
activation='relu')(fourgrams_max_pooling_layer)
fivegrams_max_pooling_layer = MaxPooling1D(pool_size=2,
strides=1,
padding='valid')(fivegrams_convolution_layer)
# Dropout layer
dropout_layer = Dropout(rate=0.5)(bigrams_max_pooling_layer)
# LSTM model
lstm_layer = LSTM(units=128,
activation='tanh',
return_sequences=False,
dropout=0.3,
return_state=False)(dropout_layer)
# Batch normalization layer
batch_norm_layer = BatchNormalization()(lstm_layer)
# Classifier model
dense_layer = Dense(units=10, activation='relu') (lstm_layer)
output_layer = Dense(units=3, activation='softmax')(dense_layer)
cnn_lstm_model = Model(inputs=input_layer, outputs=output_layer)
return cnn_lstm_model
with tf.device('/device:GPU:0'):
sinovac_cnn_lstm_model = build_cnn_lstm_model(SINOVAC_NUM_WORDS,
SINOVAC_EMBEDDING_VECTOR_SIZE,
SINOVAC_EMBEDDING_MATRIX,
SINOVAC_MAX_SEQUENCE)
sinovac_cnn_lstm_model.summary()
sinovac_cnn_lstm_model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['accuracy'])
奇怪的是,我用的是 GTX 的 GPU:1,它工作正常 GTX 1080Ti 显存比RTX A6000 显存少,但为什么用RTX A6000 编译训练时出现Out Of Memory 错误? 有什么解决办法吗?
尽管 physical_device_desc
将其称为 device: 0
,但使用的是 name: "/device:GPU:1"
条目下的名称。因此,即使 1080Ti 在 physical_device_desc
字段中称自己为 device: 1
,它实际上是 `"/device:GPU:0".
也就是说,用with tf.device('/device:GPU:0'):
用1080Ti,with tf.device('/device:GPU:1'):
用A6000
这听起来可能很脆弱,但我刚刚浏览了 Tensorflow 文档,似乎没有 built-in 函数可以通过型号名称识别 GPU。因此,您需要 运行 通过设备列表,并匹配该物理设备名称(或简单地找到内存最多的那个)以获得您需要的“GPU:nnn”名称。