Keras:ResourceExhaustedError(回溯见上文):分配形状为 [26671,32,32,64] 的张量时出现 OOM
Keras: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[26671,32,32,64]
我在 tensorflow 后端(Keras 版本 2.1)上使用 Keras 训练我的网络,我已经尝试了很多在互联网上可用的东西,但没有找到任何解决方案。
My Training set and labels: 26721(each image have size (32, 32,1)) , (26721, 10)
Validation set and labels: 6680(each image have size(32,32,1), (6680,10)
到目前为止,这是我的模型,我正在使用 Python3。
def CNN(input_, num_classes):
model = Sequential()
model.add(Convolution2D(16, kernel_size=(7, 7), border_mode='same',
input_shape=input_))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1) , border_mode='same' ))
model.add(Convolution2D(64, (3, 3), padding ='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1,1), border_mode='same' ))
model.add(Flatten())
model.add(Dense(96))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
model = CNN(image_size, num_classes)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
print(model.summary())
csv_logger = CSVLogger('training.log')
early_stop = EarlyStopping('val_acc', patience=200, verbose=1)
model_checkpoint = ModelCheckpoint(model_save_path,
'val_acc', verbose=0,
save_best_only=True)
model_callbacks = [early_stop, model_checkpoint, csv_logger]
# print "len(train_dataset) ", len(train_dataset)
print("int(len(train_dataset)/batch_size) ", int(len(train_dataset)/batch_size))
K.get_session().run(tf.global_variables_initializer())
model.fit_generator(train,
steps_per_epoch=np.ceil(len(train_dataset)/batch_size),
epochs=num_epochs,
verbose=1,
validation_data=valid,
validation_steps=batch_size,
callbacks=model_callbacks)
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 32, 32, 16) 800
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 16) 64
_________________________________________________________________
activation_1 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 64) 9280
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
activation_2 (Activation) (None, 32, 32, 64) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 65536) 0
_________________________________________________________________
dense_1 (Dense) (None, 96) 6291552
_________________________________________________________________
activation_3 (Activation) (None, 96) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 970
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 6,302,922
Trainable params: 6,302,762
Non-trainable params: 160
我正在根据批量大小发送图像。这是我的生成器函数:
# Generate images according to batch size
def gen(dataset, labels, batch_size):
images = []
digits = []
i = 0
while True:
images.append(dataset[i])
digits.append(labels[i])
i+=1
if i == batch_size:
yield (np.array(images), np.array(digits))
images = []
digits = []
# Generate remaining images also
if i == len(dataset):
yield (np.array(images), np.array(digits))
images, digits = [], []
i = 0
train = gen(train_data, train_labels, batch_size)
valid = gen(valid_data, valid_lables, batch_size)
终端登录错误:
请检查此 link 是否存在完整错误:Terminal Output
任何人都可以帮助我,我在这里做错了什么?
提前致谢
您正在整个训练集上训练您的网络,该训练集对于内存来说太大了,对于您的 gpu 来说也太大了。
机器学习的标准是创建小批量数据并对其进行训练。批量大小通常是 16、32、64 或其他 2 的幂,但它可以是任何值,您通常必须通过交叉验证找到正确的批量大小。
从日志中可以看出,在分配 edge_1094_loss 内存之前,内存已经满了。检查值 Limit ad InUse。
这可能是因为旧型号占用了内存。解决这个问题的快速破解就是简单地终止进程。这将释放以某种方式未被垃圾收集的旧模型消耗的所有内存。
我在 tensorflow 后端(Keras 版本 2.1)上使用 Keras 训练我的网络,我已经尝试了很多在互联网上可用的东西,但没有找到任何解决方案。
My Training set and labels: 26721(each image have size (32, 32,1)) , (26721, 10)
Validation set and labels: 6680(each image have size(32,32,1), (6680,10)
到目前为止,这是我的模型,我正在使用 Python3。
def CNN(input_, num_classes):
model = Sequential()
model.add(Convolution2D(16, kernel_size=(7, 7), border_mode='same',
input_shape=input_))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1) , border_mode='same' ))
model.add(Convolution2D(64, (3, 3), padding ='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1,1), border_mode='same' ))
model.add(Flatten())
model.add(Dense(96))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
model = CNN(image_size, num_classes)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
print(model.summary())
csv_logger = CSVLogger('training.log')
early_stop = EarlyStopping('val_acc', patience=200, verbose=1)
model_checkpoint = ModelCheckpoint(model_save_path,
'val_acc', verbose=0,
save_best_only=True)
model_callbacks = [early_stop, model_checkpoint, csv_logger]
# print "len(train_dataset) ", len(train_dataset)
print("int(len(train_dataset)/batch_size) ", int(len(train_dataset)/batch_size))
K.get_session().run(tf.global_variables_initializer())
model.fit_generator(train,
steps_per_epoch=np.ceil(len(train_dataset)/batch_size),
epochs=num_epochs,
verbose=1,
validation_data=valid,
validation_steps=batch_size,
callbacks=model_callbacks)
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 32, 32, 16) 800
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 16) 64
_________________________________________________________________
activation_1 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 64) 9280
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
activation_2 (Activation) (None, 32, 32, 64) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 65536) 0
_________________________________________________________________
dense_1 (Dense) (None, 96) 6291552
_________________________________________________________________
activation_3 (Activation) (None, 96) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 970
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 6,302,922
Trainable params: 6,302,762
Non-trainable params: 160
我正在根据批量大小发送图像。这是我的生成器函数:
# Generate images according to batch size
def gen(dataset, labels, batch_size):
images = []
digits = []
i = 0
while True:
images.append(dataset[i])
digits.append(labels[i])
i+=1
if i == batch_size:
yield (np.array(images), np.array(digits))
images = []
digits = []
# Generate remaining images also
if i == len(dataset):
yield (np.array(images), np.array(digits))
images, digits = [], []
i = 0
train = gen(train_data, train_labels, batch_size)
valid = gen(valid_data, valid_lables, batch_size)
终端登录错误:
请检查此 link 是否存在完整错误:Terminal Output
任何人都可以帮助我,我在这里做错了什么?
提前致谢
您正在整个训练集上训练您的网络,该训练集对于内存来说太大了,对于您的 gpu 来说也太大了。
机器学习的标准是创建小批量数据并对其进行训练。批量大小通常是 16、32、64 或其他 2 的幂,但它可以是任何值,您通常必须通过交叉验证找到正确的批量大小。
从日志中可以看出,在分配 edge_1094_loss 内存之前,内存已经满了。检查值 Limit ad InUse。
这可能是因为旧型号占用了内存。解决这个问题的快速破解就是简单地终止进程。这将释放以某种方式未被垃圾收集的旧模型消耗的所有内存。