如何在重新启动时指定 num_test_nets?

How do I specify num_test_nets in restart?

我训练了一段时间的 GoogleNet 模型,现在我想从一个检查点重新开始,添加一个测试阶段。我已经在我的 train_val.prototxt 文件中进行了测试,并且我将适当的参数添加到我的 solver.prototxt ...但是我在重新启动时遇到错误:

I0712 15:53:02.615947 47646 net.cpp:278] This network produces output loss2/loss1
I0712 15:53:02.615964 47646 net.cpp:278] This network produces output loss3/loss3
I0712 15:53:02.616109 47646 net.cpp:292] Network initialization done.
F0712 15:53:02.616665 47646 solver.cpp:128] Check failed: param_.test_iter_size() == num_test_nets (1 vs. 0) test_iter must be specified for each test network.
*** Check failure stack trace: ***
    @     0x7f550cf70e6d  (unknown)
    @     0x7f550cf72ced  (unknown)
    @     0x7f550cf70a5c  (unknown)
    @     0x7f550cf7363e  (unknown)
    @     0x7f550d3b605b  caffe::Solver<>::InitTestNets()
    @     0x7f550d3b63ed  caffe::Solver<>::Init()
    @     0x7f550d3b6738  caffe::Solver<>::Solver()
    @     0x7f550d4fa633  caffe::Creator_SGDSolver<>()
    @     0x7f550da5bb76  caffe::SolverRegistry<>::CreateSolver()
    @     0x7f550da548f4  train()
    @     0x7f550da52316  main
    @     0x7f5508f43b15  __libc_start_main
    @     0x7f550da52d3d  (unknown)

solver.prototxt

train_net: "<my_path>/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000 
snapshot_prefix: "models/<my_path>"
solver_mode: CPU

train_val.prototxt 训练和测试层:

name: "GoogleNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "/<blah>/ilsvrc12_train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "/<blah>/ilsvrc12_val_lmdb"
    batch_size: 32
    backend: LMDB
  }
}

您应该从

修改 solver.prototxt 中的一处
train_net: "/train_val.prototxt"

net: "/train_val.prototxt"

因为Solver没有使用"train_net"的值来初始化测试网,所以你添加的测试阶段不是由求解器创建的。

实际上,参数"train_net"和"test_net"分别用于初始化train net和test net,而"net"用于两者。