Caffe 模型无法学习
Caffe model fails to learn
我在 Keras 中实现了以下卷积模型,经过 100,000 个纪元的训练后,它以更高的精度显示出出色的性能。
img_rows, img_cols = 24, 15
input_shape = (img_rows, img_cols, 1)
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
然而,在 Caffe 中尝试实现相同的模型后,它无法以几乎固定的损失值 >=2.1 && <=2.6 进行训练。
这是我的 Caffe prototext 实现:
name: "FneishNet"
layer {
name: "inlayer1"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "examples/fneishnet_numbers/fneishnet_numbers_train_lmdb"
batch_size: 128
backend: LMDB
}
}
layer {
name: "inlayer1"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
data_param {
source: "examples/fneishnet_numbers/fneishnet_numbers_val_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 1
}
}
layer {
name: "drop1"
type: "Dropout"
bottom: "pool1"
top: "pool1"
dropout_param {
dropout_ratio: 0.25
}
}
layer {
name: "flatten1"
type: "Flatten"
bottom: "pool1"
top: "flatten1"
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "flatten1"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "drop2"
type: "Dropout"
bottom: "fc1"
top: "fc1"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 11
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
top: "loss"
}
这是我的模型求解器(超参数):
net: "models/fneishnet_numbers/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
gamma: 0.1
lr_policy: "poly"
power: 0.5
max_iter: 3000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 100000
snapshot_prefix: "models/fneishnet_numbers/fneishnet_numbers_quick"
solver_mode: CPU
我相信如果将模型转换为 Caffe 没有问题,那么它的执行方式应该与在 Keras 中的执行方式相同,所以我想我错过了一些东西。
如有任何帮助,我们将不胜感激。
poly:有效学习率遵循多项式衰减,即
// max_iter 归零。 return base_lr (1 - iter/max_iter) ^ (幂)
所以基本上,你确定要将功率设置为 0.5 in
returns base_lr (1 - iter/max_iter) ^(幂)?我认为这可能是问题所在,因为您正逐渐减去某些东西,请尝试 2?
我在 Keras 中实现了以下卷积模型,经过 100,000 个纪元的训练后,它以更高的精度显示出出色的性能。
img_rows, img_cols = 24, 15
input_shape = (img_rows, img_cols, 1)
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
然而,在 Caffe 中尝试实现相同的模型后,它无法以几乎固定的损失值 >=2.1 && <=2.6 进行训练。 这是我的 Caffe prototext 实现:
name: "FneishNet"
layer {
name: "inlayer1"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "examples/fneishnet_numbers/fneishnet_numbers_train_lmdb"
batch_size: 128
backend: LMDB
}
}
layer {
name: "inlayer1"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
data_param {
source: "examples/fneishnet_numbers/fneishnet_numbers_val_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 1
}
}
layer {
name: "drop1"
type: "Dropout"
bottom: "pool1"
top: "pool1"
dropout_param {
dropout_ratio: 0.25
}
}
layer {
name: "flatten1"
type: "Flatten"
bottom: "pool1"
top: "flatten1"
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "flatten1"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "drop2"
type: "Dropout"
bottom: "fc1"
top: "fc1"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 11
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
top: "loss"
}
这是我的模型求解器(超参数):
net: "models/fneishnet_numbers/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
gamma: 0.1
lr_policy: "poly"
power: 0.5
max_iter: 3000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 100000
snapshot_prefix: "models/fneishnet_numbers/fneishnet_numbers_quick"
solver_mode: CPU
我相信如果将模型转换为 Caffe 没有问题,那么它的执行方式应该与在 Keras 中的执行方式相同,所以我想我错过了一些东西。 如有任何帮助,我们将不胜感激。
poly:有效学习率遵循多项式衰减,即 // max_iter 归零。 return base_lr (1 - iter/max_iter) ^ (幂)
所以基本上,你确定要将功率设置为 0.5 in returns base_lr (1 - iter/max_iter) ^(幂)?我认为这可能是问题所在,因为您正逐渐减去某些东西,请尝试 2?