训练超过 300k 的图像分类器 类
Training a image classifier with over 300k classes
是否可以训练具有大量 classes 的图像 classifier 网络? (比如 300k classes),每个 class 至少有 10 张图像在 train/test/validation 之间分割(即 >300 万张 250x250x3 图像)。
我尝试使用 ResNet50 模型训练数据集并将批量大小降低至 1,但仍然 运行 遇到 OOM 问题 (2080 Ti)。我发现 OOM 是由于参数太多造成的,因此我尝试在批大小为 1 的极其基本的 10 层模型上训练网络。它运行,但是 speed/accuracy 不出所料的糟糕。
我是否可以将训练集分成更小的 class 部分,这样:
1st .h5 = classes 1 ~ 20,000
2nd .h5 = classes 20,001 ~ 40,000
3rd .h5 = classes 40,001 ~ 60,000 等
然后合并到一个 h5 文件中,可以加载该文件以识别所有 30 万种不同的 classes?
根据 Ashish 的建议进行编辑:
我已经(我认为)成功地将 2 个模型合并为一个,但是合并后的模型在层数上增加了一倍...
源代码:
model1 = load_model('001.h5')
model2 = load_model('002.h5')
for layer in model1.layers:
layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
layer.trainable = False
for layer in model2.layers:
layer._name = layer._name + "_2"
layer.trainable = False
x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)
x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]
x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)
生成的模型如下所示:
Model: "model"
_________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=========================================================================
input_1_1 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
input_1_2 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_1[0][0]
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_2[0][0]
_________________________________________________________________________
conv1_conv_1 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_1[0][0]
_________________________________________________________________________
conv1_conv_2 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_2[0][0]
...
...
conv5_block3_out_1 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_1[0][0]
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_2[0][0]
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_1[0][0]
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_2[0][0]
_________________________________________________________________________
probs_1 (Dense) (None, 953) 1952697 avg_pool_1[0][0]
_________________________________________________________________________
probs_2 (Dense) (None, 3891) 7972659 avg_pool_2[0][0]
_________________________________________________________________________
out1 (Dense) (None, 953) 909162 probs_1[0][0]
_________________________________________________________________________
out2 (Dense) (None, 3891) 15143772 probs_2[0][0]
_________________________________________________________________________
concatenate (Concatenate) (None, 4844) 0 out1[0][0]
out2[0][0]
_________________________________________________________________________
combined_layer (Dense) (None, 4844) 23469180 concatenate[0][0]
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780
如你所见,由于Model(inputs=[input1, input2]),所有层数都增加了一倍。当我想使用这个模型来预测图像时,这会给我带来问题。无论如何我可以做到这一点而无需将所有先前的层加倍并且只添加尾随的致密层?以这种速度,我将比以前更快地因参数数量而超载...
从技术上讲这是可能的,所以你可以做的是因为你有 3 个分类器(1.h5、2.h5、3.h5),你可以用它们的权重加载这些模型,然后使用函数 API 在 tensorflow https://www.tensorflow.org/guide/keras/functional 中,concatenate() API 会将 3 个分类器的输出合并为单个向量,然后使用少量具有激活函数的密集网络进行最终预测。
是否可以训练具有大量 classes 的图像 classifier 网络? (比如 300k classes),每个 class 至少有 10 张图像在 train/test/validation 之间分割(即 >300 万张 250x250x3 图像)。
我尝试使用 ResNet50 模型训练数据集并将批量大小降低至 1,但仍然 运行 遇到 OOM 问题 (2080 Ti)。我发现 OOM 是由于参数太多造成的,因此我尝试在批大小为 1 的极其基本的 10 层模型上训练网络。它运行,但是 speed/accuracy 不出所料的糟糕。
我是否可以将训练集分成更小的 class 部分,这样:
1st .h5 = classes 1 ~ 20,000
2nd .h5 = classes 20,001 ~ 40,000
3rd .h5 = classes 40,001 ~ 60,000 等
然后合并到一个 h5 文件中,可以加载该文件以识别所有 30 万种不同的 classes?
根据 Ashish 的建议进行编辑:
我已经(我认为)成功地将 2 个模型合并为一个,但是合并后的模型在层数上增加了一倍...
源代码:
model1 = load_model('001.h5')
model2 = load_model('002.h5')
for layer in model1.layers:
layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
layer.trainable = False
for layer in model2.layers:
layer._name = layer._name + "_2"
layer.trainable = False
x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)
x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]
x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)
生成的模型如下所示:
Model: "model"
_________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=========================================================================
input_1_1 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
input_1_2 (InputLayer) [(None, 224, 224, 3) 0
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_1[0][0]
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D) (None, 230, 230, 3) 0 input_1_2[0][0]
_________________________________________________________________________
conv1_conv_1 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_1[0][0]
_________________________________________________________________________
conv1_conv_2 (Conv2D) (None, 112, 112, 64) 9472 conv1_pad_2[0][0]
...
...
conv5_block3_out_1 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_1[0][0]
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048) 0 conv5_block3_add_2[0][0]
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_1[0][0]
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048) 0 conv5_block3_out_2[0][0]
_________________________________________________________________________
probs_1 (Dense) (None, 953) 1952697 avg_pool_1[0][0]
_________________________________________________________________________
probs_2 (Dense) (None, 3891) 7972659 avg_pool_2[0][0]
_________________________________________________________________________
out1 (Dense) (None, 953) 909162 probs_1[0][0]
_________________________________________________________________________
out2 (Dense) (None, 3891) 15143772 probs_2[0][0]
_________________________________________________________________________
concatenate (Concatenate) (None, 4844) 0 out1[0][0]
out2[0][0]
_________________________________________________________________________
combined_layer (Dense) (None, 4844) 23469180 concatenate[0][0]
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780
如你所见,由于Model(inputs=[input1, input2]),所有层数都增加了一倍。当我想使用这个模型来预测图像时,这会给我带来问题。无论如何我可以做到这一点而无需将所有先前的层加倍并且只添加尾随的致密层?以这种速度,我将比以前更快地因参数数量而超载...
从技术上讲这是可能的,所以你可以做的是因为你有 3 个分类器(1.h5、2.h5、3.h5),你可以用它们的权重加载这些模型,然后使用函数 API 在 tensorflow https://www.tensorflow.org/guide/keras/functional 中,concatenate() API 会将 3 个分类器的输出合并为单个向量,然后使用少量具有激活函数的密集网络进行最终预测。