训练超过 300k 的图像分类器类

Question

是否可以训练具有大量 classes 的图像 classifier 网络？（比如 300k classes），每个 class 至少有 10 张图像在 train/test/validation 之间分割（即 >300 万张 250x250x3 图像）。

我尝试使用 ResNet50 模型训练数据集并将批量大小降低至 1，但仍然运行遇到 OOM 问题 (2080 Ti)。我发现 OOM 是由于参数太多造成的，因此我尝试在批大小为 1 的极其基本的 10 层模型上训练网络。它运行，但是 speed/accuracy 不出所料的糟糕。

我是否可以将训练集分成更小的 class 部分，这样：

1st .h5 = classes 1 ~ 20,000

2nd .h5 = classes 20,001 ~ 40,000

3rd .h5 = classes 40,001 ~ 60,000 等

然后合并到一个 h5 文件中，可以加载该文件以识别所有 30 万种不同的 classes?

根据 Ashish 的建议进行编辑：

我已经（我认为）成功地将 2 个模型合并为一个，但是合并后的模型在层数上增加了一倍...

源代码：

model1 = load_model('001.h5')
model2 = load_model('002.h5')

for layer in model1.layers:
    layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
    layer.trainable = False

for layer in model2.layers:
    layer._name = layer._name + "_2"
    layer.trainable = False

x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)

x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]

x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)

生成的模型如下所示：

Model: "model"
_________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
=========================================================================
input_1_1 (InputLayer)          [(None, 224, 224, 3) 0                                            
_________________________________________________________________________
input_1_2 (InputLayer)          [(None, 224, 224, 3) 0                                            
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D)     (None, 230, 230, 3)  0           input_1_1[0][0]                  
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D)     (None, 230, 230, 3)  0           input_1_2[0][0]                  
_________________________________________________________________________
conv1_conv_1 (Conv2D)           (None, 112, 112, 64) 9472        conv1_pad_1[0][0]                
_________________________________________________________________________
conv1_conv_2 (Conv2D)           (None, 112, 112, 64) 9472        conv1_pad_2[0][0]                

...

...

conv5_block3_out_1 (Activation) (None, 7, 7, 2048)   0           conv5_block3_add_1[0][0]         
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048)   0           conv5_block3_add_2[0][0]         
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048)         0           conv5_block3_out_1[0][0]         
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048)         0           conv5_block3_out_2[0][0]         
_________________________________________________________________________
probs_1 (Dense)                 (None, 953)          1952697     avg_pool_1[0][0]                 
_________________________________________________________________________
probs_2 (Dense)                 (None, 3891)         7972659     avg_pool_2[0][0]                 
_________________________________________________________________________
out1 (Dense)                    (None, 953)          909162      probs_1[0][0]                    
_________________________________________________________________________
out2 (Dense)                    (None, 3891)         15143772    probs_2[0][0]                    
_________________________________________________________________________
concatenate (Concatenate)       (None, 4844)         0           out1[0][0]                       
                                                                 out2[0][0]                       
_________________________________________________________________________
combined_layer (Dense)          (None, 4844)         23469180    concatenate[0][0]                
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780

如你所见，由于Model(inputs=[input1, input2])，所有层数都增加了一倍。当我想使用这个模型来预测图像时，这会给我带来问题。无论如何我可以做到这一点而无需将所有先前的层加倍并且只添加尾随的致密层？以这种速度，我将比以前更快地因参数数量而超载...

Answer 1

从技术上讲这是可能的，所以你可以做的是因为你有 3 个分类器（1.h5、2.h5、3.h5），你可以用它们的权重加载这些模型，然后使用函数 API 在 tensorflow https://www.tensorflow.org/guide/keras/functional 中，concatenate() API 会将 3 个分类器的输出合并为单个向量，然后使用少量具有激活函数的密集网络进行最终预测。

训练超过 300k 的图像分类器类

Training a image classifier with over 300k classes

python

machine-learning

deep-learning

tensorflow

resnet

训练超过 300k 的图像分类器 类

Training a image classifier with over 300k classes

python

machine-learning

deep-learning

tensorflow

resnet

训练超过 300k 的图像分类器类