如何 return 在 [0, 1] 区间内对 vowpal wabbit 中的 SVM 进行预测

How to return predictions in the [0, 1] interval for SVMs in vowpal wabbit

如果有人问过这个问题,我们深表歉意。而不是原始预测(-r),我想通过设置 -loss_function hinge 在 [0, 1] 区间内对在 vowpal wabbit 中训练的 SVM 进行 return 预测。目前我正在尝试这个,但它没有给我我想要的东西。有什么想法吗?

vw -d vw_train_rand.vw -c -f svm_rand.vw --passes 10 --loss_function hinge -q cn;

vw -d vw_test_rand.vw -t -i svm_rand.vw -p preds_rand_svm.txt

干杯

亚伦

编辑:

1) 示例数据:

-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:124500 LenderRank0612.0614:1939 ZipSquareMiles:53.1 MailDateMonth:5 ZipPerForeignBorn:11.4 ZipPerHighSchoolPlusDegree:57.2 ZipPerCollegePlusDegree:15.2 ZipPerVeterans:13.4 ZipPopPerSquareMile:798.1 ZipPerUnemployement:8.5 ZipSexRatio:96.7 ZipHousingUnitsPerSquareMile:315.1 ZipMedianHouseholdIncome:36238 ZipPerCapitaIncome:19085 MonthsDeedDatetoMailDate:2
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:232000 LenderRank0612.0614:391 ZipSquareMiles:99.1 MailDateMonth:5 ZipPerForeignBorn:11.8 ZipPerHighSchoolPlusDegree:73.3 ZipPerCollegePlusDegree:39.3 ZipPerVeterans:9.1 ZipPopPerSquareMile:485.5 ZipPerUnemployement:5.9 ZipSexRatio:98.5 ZipHousingUnitsPerSquareMile:169.6 ZipMedianHouseholdIncome:78465 ZipPerCapitaIncome:31908 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:1 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:0 |n Loan.Size:90000 LenderRank0612.0614:130 ZipSquareMiles:32.6 MailDateMonth:5 ZipPerForeignBorn:51.5 ZipPerHighSchoolPlusDegree:60.7 ZipPerCollegePlusDegree:17.3 ZipPerVeterans:9.3 ZipPopPerSquareMile:783.2 ZipPerUnemployement:4.8 ZipSexRatio:97.2 ZipHousingUnitsPerSquareMile:274.2 ZipMedianHouseholdIncome:64668 ZipPerCapitaIncome:25632 MonthsDeedDatetoMailDate:3
-1 |c Loan.TypeConventional:0 Loan.TypeFHA:0 Loan.TypeUnknown:0 Loan.TypeVA:1 |n Loan.Size:121301 LenderRank0612.0614:23 ZipSquareMiles:6.8 MailDateMonth:5 ZipPerForeignBorn:14.9 ZipPerHighSchoolPlusDegree:63.9 ZipPerCollegePlusDegree:24.2 ZipPerVeterans:10 ZipPopPerSquareMile:5245.1 ZipPerUnemployement:7.1 ZipSexRatio:93.3 ZipHousingUnitsPerSquareMile:2001.6 ZipMedianHouseholdIncome:56398 ZipPerCapitaIncome:25815 MonthsDeedDatetoMailDate:2

2) 我目前得到的:

-1.001968
-1.000737
-1.000441
-1.001823

3) 我希望看到的:连续 [0, 1] 区间内的预测,这样每个条目都可以解释为与事件相关的预测概率,例如:

0.012
0.009
0.010
0.0085

如果你想预测概率,你应该用 --loss_function=logistic 训练并用 --link=logistic 测试。 hinge loss(在SVM中使用)导致max-margin分类器,不适合预测概率。

请注意,仅使用 --loss_function=hinge 不会从 VW 生成 SVM(没有内核)。如果您想要以在线方式训练的带有径向基内核的支持向量机,请使用 --kvsm --kernel=rbf(有关更多参数,请参阅 vw --ksvm -h | grep -A9 KSVM)。