使用 Vowpal Wabbit 获得未校准的概率输出,广告转化预测
Getting uncalibrated probability outputs with Vowpal Wabbit, ad-conversion prediction
我正在尝试使用 Vowpal Wabbit 来预测广告展示的转化率,我得到了非直观的概率输出,当正面 class小于 1%。
我的数据集中的 positive/negative 不平衡是 1/100(我已经对负样本进行了欠采样 class),所以我在正样本中使用了 100 的权重。
负样本的标签为 -1,正样本的标签为 1。我使用 shuf
打乱了正样本和负样本的顺序,以便在线学习正常工作。
vw 文件中的示例行:
1 100 'c4ac3440|i search_delay_log:3.58351893846 click_count_log:3.58351893846 banner_impression_count_log:3.98898404656 |c es i_type_2 xvertical_1_61 vertical_1 creat_size_728x90 retargeting
-1 1 'a4d25cf1|i search_delay_log:11.2825684591 click_count_log:11.2825684591 banner_impression_count_log:4.48863636973 |c br i_type_1 xvertical_1_960 vertical_1 creat_size_300x600 retargeting
现在我使用以下方法从训练集中创建模型:
vw -d impressions_rand.aa --loss_function logistic -c -k --passes 12 -f model.vw
输出:
final_regressor = model.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = impressions_rand.aa.cache
Reading datafile = impressions_rand.aa
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 -1.0000 0.0000 11
0.510760 0.328374 2 2.0 -1.0000 -0.9449 11
0.387521 0.264282 4 4.0 -1.0000 -1.1825 11
1.765374 1.818883 8 107.0 1.0000 -1.7020 11
2.152669 2.444504 51 249.0 1.0000 -3.2953 11
1.289870 0.427071 201 498.0 -1.0000 -3.5498 11
0.878843 0.528943 588 1083.0 1.0000 -1.3394 9
0.852358 0.825872 1176 2166.0 -1.0000 -6.7918 11
0.871977 0.891597 2451 4332.0 -1.0000 -2.7031 11
0.689428 0.506878 4110 8664.0 -1.0000 -2.7525 11
0.638008 0.586589 8517 17328.0 -1.0000 -5.8017 11
0.580220 0.522713 17515 34741.0 1.0000 2.1519 11
0.526281 0.472343 35525 69482.0 -1.0000 -6.2931 9
0.497601 0.468921 71050 138964.0 -1.0000 -7.6245 9
0.479305 0.461008 143585 277928.0 -1.0000 -0.8296 11
0.443734 0.443734 288655 555856.0 -1.0000 -2.5795 11 h
0.438806 0.433925 578181 1111791.0 1.0000 0.8503 11 h
finished run
number of examples per pass = 216000
passes used = 5
weighted example sum = 2072475.000000
weighted label sum = -67475.000000
average loss = 0.432676 h
best constant = -0.065138
best constant's loss = 0.692617
total feature number = 11548690
现在在测试集上进行预测。 --link logistic
应该将 vw 输出转换为 [0, 1]
.
范围内的概率
vw -d impressions_rand.ab --link logistic -i model.vw -p preds_ab.txt
输出:
predictions = preds_ab.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
68.282379 68.282379 1 1.0 -1.0000 0.0001 9
38.748867 9.215355 2 2.0 -1.0000 0.0174 11
21.256140 3.763414 4 4.0 -1.0000 0.8345 11
11.685329 2.114518 8 8.0 -1.0000 0.3508 11
9.457854 7.230378 16 16.0 -1.0000 0.0069 11
7.371087 5.284320 32 32.0 -1.0000 0.3561 11
7.061980 6.752873 64 64.0 -1.0000 0.6549 11
5.423309 3.784638 128 128.0 -1.0000 0.2597 11
3.252394 1.725597 211 310.0 1.0000 0.7686 11
2.140099 1.052366 330 627.0 1.0000 0.7143 11
1.671550 1.203000 660 1254.0 -1.0000 0.8054 11
1.788466 1.905383 1320 2508.0 -1.0000 0.0676 9
1.508163 1.234410 2502 5076.0 1.0000 0.3921 11
1.282862 1.060063 5061 10209.0 1.0000 0.4258 9
1.119420 0.955977 11013 20418.0 -1.0000 0.6892 11
1.017911 0.916403 22323 40836.0 -1.0000 0.5301 9
0.888435 0.758960 42171 81672.0 -1.0000 0.3500 11
0.787709 0.686983 84243 163344.0 -1.0000 0.2360 9
0.703270 0.618831 170268 326688.0 -1.0000 0.5707 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 397936.000000
weighted label sum = -12936.000000
average loss = 0.684043
best constant = -0.032508
best constant's loss = 0.998943
total feature number = 2216941
这会输出一个预测文件 preds_ab.txt
,例如:
0.000095 7c14ae23
0.017367 3e9558bd
0.139393 6a1cd72f
0.834518 dfe76f6e
0.089810 2b88b547
如果我计算这些预测的 ROC-AUC 分数,我得到的值为 0.85,这接近我使用 scikit-learn 得到的值 (0.90)。然而,概率输出根本没有校准,因为它们远高于我的预期(接近 1%)。这是直方图。
这是可靠性曲线:
这是平均概率和正频率图,当示例按概率分箱时:
很明显,输出概率远高于经过良好校准的 classifier 的预期输出概率。
我在这里做错了什么?我应该调查什么?
更新
如果我不对正 class 示例使用 100 权重,我会得到类似的非直观结果。平均概率输出为 0.27(离 1 还很远),可靠性图看起来更糟,ROC-AUC 为 0.76。
我可以确认我有 237805 个负面例子和 2195 个正面例子。
输出训练:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = impressions_rand.aa.cache
Reading datafile = impressions_rand.aa
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 -1.0000 0.0000 11
0.546724 0.400300 2 2.0 -1.0000 -0.7087 11
0.398553 0.250382 4 4.0 -1.0000 -1.3963 11
0.284506 0.170460 8 8.0 -1.0000 -2.2595 11
0.181406 0.078306 16 16.0 -1.0000 -2.8225 11
0.108136 0.034865 32 32.0 -1.0000 -4.2696 11
0.063156 0.018176 64 64.0 -1.0000 -4.7412 11
0.036415 0.009675 128 128.0 -1.0000 -4.2940 11
0.020325 0.004235 256 256.0 -1.0000 -5.9903 11
0.043248 0.066171 512 512.0 -1.0000 -5.5540 11
0.045276 0.047304 1024 1024.0 -1.0000 -4.7065 11
0.044606 0.043935 2048 2048.0 -1.0000 -6.6253 11
0.048938 0.053270 4096 4096.0 -1.0000 -5.9119 11
0.048711 0.048485 8192 8192.0 -1.0000 -2.3949 11
0.048157 0.047603 16384 16384.0 -1.0000 -9.6219 11
0.044306 0.040454 32768 32768.0 -1.0000 -8.8800 11
0.044029 0.043752 65536 65536.0 -1.0000 -5.9218 9
0.042739 0.041450 131072 131072.0 -1.0000 -3.8306 11
0.042986 0.042986 262144 262144.0 -1.0000 -6.0941 11 h
0.042321 0.041655 524288 524288.0 -1.0000 -4.0276 11 h
0.042654 0.042988 1048576 1048576.0 -1.0000 -9.9169 11 h
finished run
number of examples per pass = 216000
passes used = 7
weighted example sum = 1512000.000000
weighted label sum = -1484504.000000
average loss = 0.042763 h
best constant = -4.691161
best constant's loss = 0.051789
total feature number = 16166472
输出测试如下。我读到平均损失大于最佳常数损失表明我的模型学习有问题。
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
78.141266 78.141266 1 1.0 -1.0000 0.0001 11
54.228148 30.315029 2 2.0 -1.0000 0.0015 11
33.279501 12.330854 4 4.0 1.0000 0.0472 11
20.358767 7.438034 8 8.0 -1.0000 0.0527 11
15.780043 11.201319 16 16.0 -1.0000 0.1657 11
13.783271 11.786498 32 32.0 -1.0000 0.0012 9
9.318714 4.854158 64 64.0 -1.0000 0.7268 11
6.797651 4.276587 128 128.0 -1.0000 0.1404 9
4.674237 2.550824 256 256.0 -1.0000 0.0516 11
3.269198 1.864159 512 512.0 -1.0000 0.4092 11
2.153033 1.036868 1024 1024.0 -1.0000 0.0425 11
1.481920 0.810807 2048 2048.0 -1.0000 0.2792 11
1.005869 0.529817 4096 4096.0 -1.0000 0.2422 11
0.676574 0.347279 8192 8192.0 -1.0000 0.3003 11
0.452924 0.229274 16384 16384.0 -1.0000 0.2579 11
0.295262 0.137600 32768 32768.0 -1.0000 0.2833 11
0.191513 0.087763 65536 65536.0 -1.0000 0.2616 9
0.126758 0.062003 131072 131072.0 -1.0000 0.2670 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 207361.000000
weighted label sum = -203423.000000
average loss = 0.099565
best constant = -0.981009
best constant's loss = 0.037621
total feature number = 2217159
你说在训练集中平均每 100 个负例有一个正例。但是,你给正例赋予了 100 倍的权重,这(几乎)相当于在训练集中将每个正例重复 100 次。这样,平均预测概率应该在 50% 左右。所以你不应该感到惊讶它不在 1% 左右。
根据你提供的vw输出,似乎训练集中一个正例有超过100个负例impressions_rand.aa,所以"weighted label sum"是负例(否则应该是0 左右)。因此,平均预测概率不是 50%,而是大约 36%。
感谢 Martin Popel 和 arielf 的评论,我解决了这个问题。 :)
- 我在生成预测时忘记使用
-t
。
- 生成预测时我没有指定
--loss_function logisitc
。
因此,在使用默认损失函数而不是逻辑损失函数进行测试时更新模型,破坏了模型并产生了错误的结果。
外卖:
- 在测试期间也使用
--loss_function logistic
来查看正确的损失输出。
- 如果您不想在预测时更新模型,请记住使用
-t
。
这是测试时输出现在的样子(没有示例加权):
$ vw -d impressions_rand.ab --link logistic --loss_function logistic -i model.vw -t -p preds.txt
only testing
predictions = preds.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000053 0.000053 1 1.0 -1.0000 0.0001 11
0.000370 0.000687 2 2.0 -1.0000 0.0007 11
1.252868 2.505366 4 4.0 1.0000 0.0067 11
0.638249 0.023630 8 8.0 -1.0000 0.0036 11
0.322060 0.005872 16 16.0 -1.0000 0.0031 11
0.164750 0.007439 32 32.0 -1.0000 0.0000 9
0.084911 0.005072 64 64.0 -1.0000 0.0081 11
0.076905 0.068899 128 128.0 -1.0000 0.0004 9
0.055126 0.033347 256 256.0 -1.0000 0.0000 11
0.052986 0.050847 512 512.0 -1.0000 0.0133 11
0.038351 0.023715 1024 1024.0 -1.0000 0.0000 11
0.037059 0.035767 2048 2048.0 -1.0000 0.0167 11
0.038848 0.040637 4096 4096.0 -1.0000 0.0112 11
0.038903 0.038957 8192 8192.0 -1.0000 0.0281 11
0.041625 0.044348 16384 16384.0 -1.0000 0.0001 11
0.042526 0.043426 32768 32768.0 -1.0000 0.0218 11
0.042538 0.042551 65536 65536.0 -1.0000 0.0000 9
0.042150 0.041763 131072 131072.0 -1.0000 0.0019 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 207361.000000
weighted label sum = -203423.000000
average loss = 0.042438
best constant = -4.647395
best constant's loss = 0.053670
total feature number = 2217159
你看现在报告的average loss
小于best constant's loss
,迭代平均损失也在预期区间内。
此外,输出概率现在非常有意义:
我正在尝试使用 Vowpal Wabbit 来预测广告展示的转化率,我得到了非直观的概率输出,当正面 class小于 1%。
我的数据集中的 positive/negative 不平衡是 1/100(我已经对负样本进行了欠采样 class),所以我在正样本中使用了 100 的权重。
负样本的标签为 -1,正样本的标签为 1。我使用 shuf
打乱了正样本和负样本的顺序,以便在线学习正常工作。
vw 文件中的示例行:
1 100 'c4ac3440|i search_delay_log:3.58351893846 click_count_log:3.58351893846 banner_impression_count_log:3.98898404656 |c es i_type_2 xvertical_1_61 vertical_1 creat_size_728x90 retargeting
-1 1 'a4d25cf1|i search_delay_log:11.2825684591 click_count_log:11.2825684591 banner_impression_count_log:4.48863636973 |c br i_type_1 xvertical_1_960 vertical_1 creat_size_300x600 retargeting
现在我使用以下方法从训练集中创建模型:
vw -d impressions_rand.aa --loss_function logistic -c -k --passes 12 -f model.vw
输出:
final_regressor = model.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = impressions_rand.aa.cache
Reading datafile = impressions_rand.aa
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 -1.0000 0.0000 11
0.510760 0.328374 2 2.0 -1.0000 -0.9449 11
0.387521 0.264282 4 4.0 -1.0000 -1.1825 11
1.765374 1.818883 8 107.0 1.0000 -1.7020 11
2.152669 2.444504 51 249.0 1.0000 -3.2953 11
1.289870 0.427071 201 498.0 -1.0000 -3.5498 11
0.878843 0.528943 588 1083.0 1.0000 -1.3394 9
0.852358 0.825872 1176 2166.0 -1.0000 -6.7918 11
0.871977 0.891597 2451 4332.0 -1.0000 -2.7031 11
0.689428 0.506878 4110 8664.0 -1.0000 -2.7525 11
0.638008 0.586589 8517 17328.0 -1.0000 -5.8017 11
0.580220 0.522713 17515 34741.0 1.0000 2.1519 11
0.526281 0.472343 35525 69482.0 -1.0000 -6.2931 9
0.497601 0.468921 71050 138964.0 -1.0000 -7.6245 9
0.479305 0.461008 143585 277928.0 -1.0000 -0.8296 11
0.443734 0.443734 288655 555856.0 -1.0000 -2.5795 11 h
0.438806 0.433925 578181 1111791.0 1.0000 0.8503 11 h
finished run
number of examples per pass = 216000
passes used = 5
weighted example sum = 2072475.000000
weighted label sum = -67475.000000
average loss = 0.432676 h
best constant = -0.065138
best constant's loss = 0.692617
total feature number = 11548690
现在在测试集上进行预测。 --link logistic
应该将 vw 输出转换为 [0, 1]
.
vw -d impressions_rand.ab --link logistic -i model.vw -p preds_ab.txt
输出:
predictions = preds_ab.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
68.282379 68.282379 1 1.0 -1.0000 0.0001 9
38.748867 9.215355 2 2.0 -1.0000 0.0174 11
21.256140 3.763414 4 4.0 -1.0000 0.8345 11
11.685329 2.114518 8 8.0 -1.0000 0.3508 11
9.457854 7.230378 16 16.0 -1.0000 0.0069 11
7.371087 5.284320 32 32.0 -1.0000 0.3561 11
7.061980 6.752873 64 64.0 -1.0000 0.6549 11
5.423309 3.784638 128 128.0 -1.0000 0.2597 11
3.252394 1.725597 211 310.0 1.0000 0.7686 11
2.140099 1.052366 330 627.0 1.0000 0.7143 11
1.671550 1.203000 660 1254.0 -1.0000 0.8054 11
1.788466 1.905383 1320 2508.0 -1.0000 0.0676 9
1.508163 1.234410 2502 5076.0 1.0000 0.3921 11
1.282862 1.060063 5061 10209.0 1.0000 0.4258 9
1.119420 0.955977 11013 20418.0 -1.0000 0.6892 11
1.017911 0.916403 22323 40836.0 -1.0000 0.5301 9
0.888435 0.758960 42171 81672.0 -1.0000 0.3500 11
0.787709 0.686983 84243 163344.0 -1.0000 0.2360 9
0.703270 0.618831 170268 326688.0 -1.0000 0.5707 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 397936.000000
weighted label sum = -12936.000000
average loss = 0.684043
best constant = -0.032508
best constant's loss = 0.998943
total feature number = 2216941
这会输出一个预测文件 preds_ab.txt
,例如:
0.000095 7c14ae23
0.017367 3e9558bd
0.139393 6a1cd72f
0.834518 dfe76f6e
0.089810 2b88b547
如果我计算这些预测的 ROC-AUC 分数,我得到的值为 0.85,这接近我使用 scikit-learn 得到的值 (0.90)。然而,概率输出根本没有校准,因为它们远高于我的预期(接近 1%)。这是直方图。
这是可靠性曲线:
这是平均概率和正频率图,当示例按概率分箱时:
很明显,输出概率远高于经过良好校准的 classifier 的预期输出概率。
我在这里做错了什么?我应该调查什么?
更新
如果我不对正 class 示例使用 100 权重,我会得到类似的非直观结果。平均概率输出为 0.27(离 1 还很远),可靠性图看起来更糟,ROC-AUC 为 0.76。
我可以确认我有 237805 个负面例子和 2195 个正面例子。
输出训练:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = impressions_rand.aa.cache
Reading datafile = impressions_rand.aa
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 -1.0000 0.0000 11
0.546724 0.400300 2 2.0 -1.0000 -0.7087 11
0.398553 0.250382 4 4.0 -1.0000 -1.3963 11
0.284506 0.170460 8 8.0 -1.0000 -2.2595 11
0.181406 0.078306 16 16.0 -1.0000 -2.8225 11
0.108136 0.034865 32 32.0 -1.0000 -4.2696 11
0.063156 0.018176 64 64.0 -1.0000 -4.7412 11
0.036415 0.009675 128 128.0 -1.0000 -4.2940 11
0.020325 0.004235 256 256.0 -1.0000 -5.9903 11
0.043248 0.066171 512 512.0 -1.0000 -5.5540 11
0.045276 0.047304 1024 1024.0 -1.0000 -4.7065 11
0.044606 0.043935 2048 2048.0 -1.0000 -6.6253 11
0.048938 0.053270 4096 4096.0 -1.0000 -5.9119 11
0.048711 0.048485 8192 8192.0 -1.0000 -2.3949 11
0.048157 0.047603 16384 16384.0 -1.0000 -9.6219 11
0.044306 0.040454 32768 32768.0 -1.0000 -8.8800 11
0.044029 0.043752 65536 65536.0 -1.0000 -5.9218 9
0.042739 0.041450 131072 131072.0 -1.0000 -3.8306 11
0.042986 0.042986 262144 262144.0 -1.0000 -6.0941 11 h
0.042321 0.041655 524288 524288.0 -1.0000 -4.0276 11 h
0.042654 0.042988 1048576 1048576.0 -1.0000 -9.9169 11 h
finished run
number of examples per pass = 216000
passes used = 7
weighted example sum = 1512000.000000
weighted label sum = -1484504.000000
average loss = 0.042763 h
best constant = -4.691161
best constant's loss = 0.051789
total feature number = 16166472
输出测试如下。我读到平均损失大于最佳常数损失表明我的模型学习有问题。
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
78.141266 78.141266 1 1.0 -1.0000 0.0001 11
54.228148 30.315029 2 2.0 -1.0000 0.0015 11
33.279501 12.330854 4 4.0 1.0000 0.0472 11
20.358767 7.438034 8 8.0 -1.0000 0.0527 11
15.780043 11.201319 16 16.0 -1.0000 0.1657 11
13.783271 11.786498 32 32.0 -1.0000 0.0012 9
9.318714 4.854158 64 64.0 -1.0000 0.7268 11
6.797651 4.276587 128 128.0 -1.0000 0.1404 9
4.674237 2.550824 256 256.0 -1.0000 0.0516 11
3.269198 1.864159 512 512.0 -1.0000 0.4092 11
2.153033 1.036868 1024 1024.0 -1.0000 0.0425 11
1.481920 0.810807 2048 2048.0 -1.0000 0.2792 11
1.005869 0.529817 4096 4096.0 -1.0000 0.2422 11
0.676574 0.347279 8192 8192.0 -1.0000 0.3003 11
0.452924 0.229274 16384 16384.0 -1.0000 0.2579 11
0.295262 0.137600 32768 32768.0 -1.0000 0.2833 11
0.191513 0.087763 65536 65536.0 -1.0000 0.2616 9
0.126758 0.062003 131072 131072.0 -1.0000 0.2670 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 207361.000000
weighted label sum = -203423.000000
average loss = 0.099565
best constant = -0.981009
best constant's loss = 0.037621
total feature number = 2217159
你说在训练集中平均每 100 个负例有一个正例。但是,你给正例赋予了 100 倍的权重,这(几乎)相当于在训练集中将每个正例重复 100 次。这样,平均预测概率应该在 50% 左右。所以你不应该感到惊讶它不在 1% 左右。
根据你提供的vw输出,似乎训练集中一个正例有超过100个负例impressions_rand.aa,所以"weighted label sum"是负例(否则应该是0 左右)。因此,平均预测概率不是 50%,而是大约 36%。
感谢 Martin Popel 和 arielf 的评论,我解决了这个问题。 :)
- 我在生成预测时忘记使用
-t
。 - 生成预测时我没有指定
--loss_function logisitc
。
因此,在使用默认损失函数而不是逻辑损失函数进行测试时更新模型,破坏了模型并产生了错误的结果。
外卖:
- 在测试期间也使用
--loss_function logistic
来查看正确的损失输出。 - 如果您不想在预测时更新模型,请记住使用
-t
。
这是测试时输出现在的样子(没有示例加权):
$ vw -d impressions_rand.ab --link logistic --loss_function logistic -i model.vw -t -p preds.txt
only testing
predictions = preds.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = impressions_rand.ab
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000053 0.000053 1 1.0 -1.0000 0.0001 11
0.000370 0.000687 2 2.0 -1.0000 0.0007 11
1.252868 2.505366 4 4.0 1.0000 0.0067 11
0.638249 0.023630 8 8.0 -1.0000 0.0036 11
0.322060 0.005872 16 16.0 -1.0000 0.0031 11
0.164750 0.007439 32 32.0 -1.0000 0.0000 9
0.084911 0.005072 64 64.0 -1.0000 0.0081 11
0.076905 0.068899 128 128.0 -1.0000 0.0004 9
0.055126 0.033347 256 256.0 -1.0000 0.0000 11
0.052986 0.050847 512 512.0 -1.0000 0.0133 11
0.038351 0.023715 1024 1024.0 -1.0000 0.0000 11
0.037059 0.035767 2048 2048.0 -1.0000 0.0167 11
0.038848 0.040637 4096 4096.0 -1.0000 0.0112 11
0.038903 0.038957 8192 8192.0 -1.0000 0.0281 11
0.041625 0.044348 16384 16384.0 -1.0000 0.0001 11
0.042526 0.043426 32768 32768.0 -1.0000 0.0218 11
0.042538 0.042551 65536 65536.0 -1.0000 0.0000 9
0.042150 0.041763 131072 131072.0 -1.0000 0.0019 11
finished run
number of examples per pass = 207361
passes used = 1
weighted example sum = 207361.000000
weighted label sum = -203423.000000
average loss = 0.042438
best constant = -4.647395
best constant's loss = 0.053670
total feature number = 2217159
你看现在报告的average loss
小于best constant's loss
,迭代平均损失也在预期区间内。
此外,输出概率现在非常有意义: