添加 Xavier 和 bias_filler 后,损失值开始变为负数。为什么?
After adding Xavier and bias_filler, the loss values starts becoming negative. Why?
在我为每个卷积层添加 xavier
初始化后,损失开始变为 负 。有人可以给任何 suggestion/reason 吗?
我将以下行添加到所有卷积层:
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
I0305 14:31:53.356343 11179 solver.cpp:219] Iteration 0 (-4.02766e+28 iter/s, 0.528933s/100 iters), loss = 2.05371
I0305 14:31:53.356374 11179 solver.cpp:238] Train net output #0: accuracy = 0.11937
I0305 14:31:53.356384 11179 solver.cpp:238] Train net output #1: loss = 2.05371 (* 1 = 2.05371 loss)
I0305 14:31:53.356395 11179 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
I0305 14:32:28.728870 11179 solver.cpp:219] Iteration 100 (2.82699 iter/s, 35.3733s/100 iters), loss = 0.0270034
I0305 14:32:28.729014 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:32:28.729028 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:32:28.729034 11179 sgd_solver.cpp:105] Iteration 100, lr = 0.0001
I0305 14:33:03.729997 11179 solver.cpp:219] Iteration 200 (2.85701 iter/s, 35.0017s/100 iters), loss = -8.27284e-09
I0305 14:33:03.730154 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:33:03.730167 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:03.730172 11179 sgd_solver.cpp:105] Iteration 200, lr = 0.0001
I0305 14:33:38.885211 11179 solver.cpp:219] Iteration 300 (2.84449 iter/s, 35.1557s/100 iters), loss = -8.27284e-09
I0305 14:33:38.885368 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:33:38.885383 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:38.885387 11179 sgd_solver.cpp:105] Iteration 300, lr = 0.0001
I0305 14:34:14.174548 11179 solver.cpp:219] Iteration 400 (2.83368 iter/s, 35.2898s/100 iters), loss = -8.27284e-09
I0305 14:34:14.174702 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:34:14.174720 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:14.174724 11179 sgd_solver.cpp:105] Iteration 400, lr = 0.0001
I0305 14:34:49.578112 11179 solver.cpp:219] Iteration 500 (2.82453 iter/s, 35.4041s/100 iters), loss = -8.27284e-09
I0305 14:34:49.578254 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:34:49.578269 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:49.578272 11179 sgd_solver.cpp:105] Iteration 500, lr = 0.0001
I0305 14:35:25.042238 11179 solver.cpp:219] Iteration 600 (2.81971 iter/s, 35.4646s/100 iters), loss = -8.27284e-09
I0305 14:35:25.042421 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:35:25.042438 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:35:25.042443 11179 sgd_solver.cpp:105] Iteration 600, lr = 0.0001
I0305 14:36:00.540053 11179 solver.cpp:219] Iteration 700 (2.81704 iter/s, 35.4983s/100 iters), loss = -8.27284e-09
I0305 14:36:00.540194 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:36:00.540207 11179 solver.cpp:238] Train net output #1: loss =
我的另一个问题是在某些网络中,添加了Gaussian
。喜欢:
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
为什么要将这些参数添加到卷积层?是吗
因为我们是从头开始训练网络?
具体值如何赋值给std
and/orbias_filler
值?
非常感谢您的帮助。
你的损失是 -8.27284e-09
实际上是零而不是负数(caffe 使用的是单精度浮点数而不是双精度)。
您使用的是什么损失层? "SoftmaxWithLoss"
?
bias_filler
和wieght_filler
参数是在我们希望caffe随机初始化层的权重时加入的,通常我们在从头开始训练。如果您从现有模型开始训练(即微调),则这些参数没有任何意义。
std
value是根据fan-in和fan-out计算的(即in-channels和out channels的数量)以保持Blob值的统计大致为零均值和单位方差。
您可以在 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (arXiv 2015).
中找到对这些参数的分析
在我为每个卷积层添加 xavier
初始化后,损失开始变为 负 。有人可以给任何 suggestion/reason 吗?
我将以下行添加到所有卷积层:
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
I0305 14:31:53.356343 11179 solver.cpp:219] Iteration 0 (-4.02766e+28 iter/s, 0.528933s/100 iters), loss = 2.05371
I0305 14:31:53.356374 11179 solver.cpp:238] Train net output #0: accuracy = 0.11937
I0305 14:31:53.356384 11179 solver.cpp:238] Train net output #1: loss = 2.05371 (* 1 = 2.05371 loss)
I0305 14:31:53.356395 11179 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
I0305 14:32:28.728870 11179 solver.cpp:219] Iteration 100 (2.82699 iter/s, 35.3733s/100 iters), loss = 0.0270034
I0305 14:32:28.729014 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:32:28.729028 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:32:28.729034 11179 sgd_solver.cpp:105] Iteration 100, lr = 0.0001
I0305 14:33:03.729997 11179 solver.cpp:219] Iteration 200 (2.85701 iter/s, 35.0017s/100 iters), loss = -8.27284e-09
I0305 14:33:03.730154 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:33:03.730167 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:03.730172 11179 sgd_solver.cpp:105] Iteration 200, lr = 0.0001
I0305 14:33:38.885211 11179 solver.cpp:219] Iteration 300 (2.84449 iter/s, 35.1557s/100 iters), loss = -8.27284e-09
I0305 14:33:38.885368 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:33:38.885383 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:33:38.885387 11179 sgd_solver.cpp:105] Iteration 300, lr = 0.0001
I0305 14:34:14.174548 11179 solver.cpp:219] Iteration 400 (2.83368 iter/s, 35.2898s/100 iters), loss = -8.27284e-09
I0305 14:34:14.174702 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:34:14.174720 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:14.174724 11179 sgd_solver.cpp:105] Iteration 400, lr = 0.0001
I0305 14:34:49.578112 11179 solver.cpp:219] Iteration 500 (2.82453 iter/s, 35.4041s/100 iters), loss = -8.27284e-09
I0305 14:34:49.578254 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:34:49.578269 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:34:49.578272 11179 sgd_solver.cpp:105] Iteration 500, lr = 0.0001
I0305 14:35:25.042238 11179 solver.cpp:219] Iteration 600 (2.81971 iter/s, 35.4646s/100 iters), loss = -8.27284e-09
I0305 14:35:25.042421 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:35:25.042438 11179 solver.cpp:238] Train net output #1: loss = 0 (* 1 = 0 loss)
I0305 14:35:25.042443 11179 sgd_solver.cpp:105] Iteration 600, lr = 0.0001
I0305 14:36:00.540053 11179 solver.cpp:219] Iteration 700 (2.81704 iter/s, 35.4983s/100 iters), loss = -8.27284e-09
I0305 14:36:00.540194 11179 solver.cpp:238] Train net output #0: accuracy = 1
I0305 14:36:00.540207 11179 solver.cpp:238] Train net output #1: loss =
我的另一个问题是在某些网络中,添加了Gaussian
。喜欢:
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
为什么要将这些参数添加到卷积层?是吗 因为我们是从头开始训练网络?
具体值如何赋值给
std
and/orbias_filler
值?
非常感谢您的帮助。
你的损失是
-8.27284e-09
实际上是零而不是负数(caffe 使用的是单精度浮点数而不是双精度)。
您使用的是什么损失层?"SoftmaxWithLoss"
?bias_filler
和wieght_filler
参数是在我们希望caffe随机初始化层的权重时加入的,通常我们在从头开始训练。如果您从现有模型开始训练(即微调),则这些参数没有任何意义。std
value是根据fan-in和fan-out计算的(即in-channels和out channels的数量)以保持Blob值的统计大致为零均值和单位方差。
您可以在 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (arXiv 2015). 中找到对这些参数的分析