vowpal_wabbit 训练数据
vowpal_wabbit training data
我正在尝试理解 vowpal_wabbit 训练和测试数据的数据结构,但似乎无法理解它们。
我有一些像这样的训练数据。
功能 1:0
特征 2:1
特征 3:10
特征 4:5
Class 标签:A
功能 1:0
特征 2:2
特征 3:30
特征 4:8
Class 标签:C
特征 1:2
特征 2:10
特征 3:9
特征 4:7
Class 标签:B
我已经根据这个网站探索了一些训练数据示例。
http://hunch.net/~vw/validate.html
我的验证数据
1 | haha:1 hehe:2 hoho:3
1 | haha:2 hehe:2 hoho:3
3 | haha:3 hehe:2 hoho:3
1 | haha:4 hehe:2 hoho:3
2 | haha:5 hehe:2 hoho:3
但是,我不明白为什么它说我分别有4个和5个特征。
验证:
验证反馈
Total of 5 examples pasted.
(example #1) Example “1 | haha:1 hehe:2 hoho:3”.
(example #1) Found “[label] |…” prefix format.
(example #1) Example label / response / class is “1”.
(example #1) Example has default “1.0” importance weight.
(example #1) Example has default “0” base.
(example #1, namespace #1) Using default namespace.
(example #1, namespace #1) Found 3 feature(s).
(example #1, namespace #1, feature #1) Label “haha”.
(example #1, namespace #1, feature #1) Value “1”.
(example #1, namespace #1, feature #2) Label “hehe”.
(example #1, namespace #1, feature #2) Value “2”.
(example #1, namespace #1, feature #3) Label “hoho”.
(example #1, namespace #1, feature #3) Value “3”.
(example #2) Example “1 | haha:2 hehe:2 hoho:3 ”.
(example #2) Found “[label] |…” prefix format.
(example #2) Example label / response / class is “1”.
(example #2) Example has default “1.0” importance weight.
(example #2) Example has default “0” base.
(example #2, namespace #1) Using default namespace.
(example #2, namespace #1) Found 4 feature(s).
(example #2, namespace #1, feature #1) Label “haha”.
(example #2, namespace #1, feature #1) Value “2”.
(example #2, namespace #1, feature #2) Label “hehe”.
(example #2, namespace #1, feature #2) Value “2”.
(example #2, namespace #1, feature #3) Label “hoho”.
(example #2, namespace #1, feature #3) Value “3”.
(example #2, namespace #1, feature #4) Label “”.
(example #2, namespace #1, feature #4) Using default value of “1” for feature.
(example #3) Example “3 | haha:3 hehe:2 hoho:3 ”.
(example #3) Found “[label] |…” prefix format.
(example #3) Example label / response / class is “3”.
(example #3) Example has default “1.0” importance weight.
(example #3) Example has default “0” base.
(example #3, namespace #1) Using default namespace.
(example #3, namespace #1) Found 4 feature(s).
(example #3, namespace #1, feature #1) Label “haha”.
(example #3, namespace #1, feature #1) Value “3”.
(example #3, namespace #1, feature #2) Label “hehe”.
(example #3, namespace #1, feature #2) Value “2”.
(example #3, namespace #1, feature #3) Label “hoho”.
(example #3, namespace #1, feature #3) Value “3”.
(example #3, namespace #1, feature #4) Label “”.
(example #3, namespace #1, feature #4) Using default value of “1” for feature.
(example #4) Example “1 | haha:4 hehe:2 hoho:3 ”.
(example #4) Found “[label] |…” prefix format.
(example #4) Example label / response / class is “1”.
(example #4) Example has default “1.0” importance weight.
(example #4) Example has default “0” base.
(example #4, namespace #1) Using default namespace.
(example #4, namespace #1) Found 4 feature(s).
(example #4, namespace #1, feature #1) Label “haha”.
(example #4, namespace #1, feature #1) Value “4”.
(example #4, namespace #1, feature #2) Label “hehe”.
(example #4, namespace #1, feature #2) Value “2”.
(example #4, namespace #1, feature #3) Label “hoho”.
(example #4, namespace #1, feature #3) Value “3”.
(example #4, namespace #1, feature #4) Label “”.
(example #4, namespace #1, feature #4) Using default value of “1” for feature.
(example #5) Example “2 | haha:5 hehe:2 hoho:3 ”.
(example #5) Found “[label] |…” prefix format.
(example #5) Example label / response / class is “2”.
(example #5) Example has default “1.0” importance weight.
(example #5) Example has default “0” base.
(example #5, namespace #1) Using default namespace.
(example #5, namespace #1) Found 5 feature(s).
(example #5, namespace #1, feature #1) Label “haha”.
(example #5, namespace #1, feature #1) Value “5”.
(example #5, namespace #1, feature #2) Label “hehe”.
(example #5, namespace #1, feature #2) Value “2”.
(example #5, namespace #1, feature #3) Label “hoho”.
(example #5, namespace #1, feature #3) Value “3”.
(example #5, namespace #1, feature #4) Label “”.
(example #5, namespace #1, feature #4) Using default value of “1” for feature.
(example #5, namespace #1, feature #5) Label “”.
(example #5, namespace #1, feature #5) Using default value of “1” for feature.
why it claims that i have 4 and 5 features respectively
行末的额外 space 符号被 http://hunch.net/~vw/validate.html 解释为额外特征。 (是的,示例中的最后一行有两个额外的 space。)请注意 validate.html 报告额外功能的空名称:
(example #4, namespace #1, feature #4) Label “”.
请注意 validate.html 在 JavaScript 中实现并且完全独立于 VW 本身的实现(在 C 中)。 VW 忽略尾随的 spaces。您可以使用以下方法对其进行测试:
$ vw -P 1 < sample.data
...
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 1.0000 0.0000 4
0.522042 0.044084 2 2.0 1.0000 0.7900 4
1.838150 4.470366 3 3.0 3.0000 0.8857 4
1.488676 0.440255 4 4.0 1.0000 1.6635 4
1.270585 0.398217 5 5.0 2.0000 1.3690 4
所以所有五个示例都被报告为具有 4 个特征(请参阅最后一列)。
为什么是四个?自动添加了一个额外的常量(拦截)功能。如果你不想,你可以使用 vw --noconstant
.
我正在尝试理解 vowpal_wabbit 训练和测试数据的数据结构,但似乎无法理解它们。
我有一些像这样的训练数据。
功能 1:0 特征 2:1 特征 3:10 特征 4:5 Class 标签:A
功能 1:0 特征 2:2 特征 3:30 特征 4:8 Class 标签:C
特征 1:2 特征 2:10 特征 3:9 特征 4:7 Class 标签:B
我已经根据这个网站探索了一些训练数据示例。
http://hunch.net/~vw/validate.html
我的验证数据
1 | haha:1 hehe:2 hoho:3
1 | haha:2 hehe:2 hoho:3
3 | haha:3 hehe:2 hoho:3
1 | haha:4 hehe:2 hoho:3
2 | haha:5 hehe:2 hoho:3
但是,我不明白为什么它说我分别有4个和5个特征。
验证:
验证反馈
Total of 5 examples pasted.
(example #1) Example “1 | haha:1 hehe:2 hoho:3”.
(example #1) Found “[label] |…” prefix format.
(example #1) Example label / response / class is “1”.
(example #1) Example has default “1.0” importance weight.
(example #1) Example has default “0” base.
(example #1, namespace #1) Using default namespace.
(example #1, namespace #1) Found 3 feature(s).
(example #1, namespace #1, feature #1) Label “haha”.
(example #1, namespace #1, feature #1) Value “1”.
(example #1, namespace #1, feature #2) Label “hehe”.
(example #1, namespace #1, feature #2) Value “2”.
(example #1, namespace #1, feature #3) Label “hoho”.
(example #1, namespace #1, feature #3) Value “3”.
(example #2) Example “1 | haha:2 hehe:2 hoho:3 ”.
(example #2) Found “[label] |…” prefix format.
(example #2) Example label / response / class is “1”.
(example #2) Example has default “1.0” importance weight.
(example #2) Example has default “0” base.
(example #2, namespace #1) Using default namespace.
(example #2, namespace #1) Found 4 feature(s).
(example #2, namespace #1, feature #1) Label “haha”.
(example #2, namespace #1, feature #1) Value “2”.
(example #2, namespace #1, feature #2) Label “hehe”.
(example #2, namespace #1, feature #2) Value “2”.
(example #2, namespace #1, feature #3) Label “hoho”.
(example #2, namespace #1, feature #3) Value “3”.
(example #2, namespace #1, feature #4) Label “”.
(example #2, namespace #1, feature #4) Using default value of “1” for feature.
(example #3) Example “3 | haha:3 hehe:2 hoho:3 ”.
(example #3) Found “[label] |…” prefix format.
(example #3) Example label / response / class is “3”.
(example #3) Example has default “1.0” importance weight.
(example #3) Example has default “0” base.
(example #3, namespace #1) Using default namespace.
(example #3, namespace #1) Found 4 feature(s).
(example #3, namespace #1, feature #1) Label “haha”.
(example #3, namespace #1, feature #1) Value “3”.
(example #3, namespace #1, feature #2) Label “hehe”.
(example #3, namespace #1, feature #2) Value “2”.
(example #3, namespace #1, feature #3) Label “hoho”.
(example #3, namespace #1, feature #3) Value “3”.
(example #3, namespace #1, feature #4) Label “”.
(example #3, namespace #1, feature #4) Using default value of “1” for feature.
(example #4) Example “1 | haha:4 hehe:2 hoho:3 ”.
(example #4) Found “[label] |…” prefix format.
(example #4) Example label / response / class is “1”.
(example #4) Example has default “1.0” importance weight.
(example #4) Example has default “0” base.
(example #4, namespace #1) Using default namespace.
(example #4, namespace #1) Found 4 feature(s).
(example #4, namespace #1, feature #1) Label “haha”.
(example #4, namespace #1, feature #1) Value “4”.
(example #4, namespace #1, feature #2) Label “hehe”.
(example #4, namespace #1, feature #2) Value “2”.
(example #4, namespace #1, feature #3) Label “hoho”.
(example #4, namespace #1, feature #3) Value “3”.
(example #4, namespace #1, feature #4) Label “”.
(example #4, namespace #1, feature #4) Using default value of “1” for feature.
(example #5) Example “2 | haha:5 hehe:2 hoho:3 ”.
(example #5) Found “[label] |…” prefix format.
(example #5) Example label / response / class is “2”.
(example #5) Example has default “1.0” importance weight.
(example #5) Example has default “0” base.
(example #5, namespace #1) Using default namespace.
(example #5, namespace #1) Found 5 feature(s).
(example #5, namespace #1, feature #1) Label “haha”.
(example #5, namespace #1, feature #1) Value “5”.
(example #5, namespace #1, feature #2) Label “hehe”.
(example #5, namespace #1, feature #2) Value “2”.
(example #5, namespace #1, feature #3) Label “hoho”.
(example #5, namespace #1, feature #3) Value “3”.
(example #5, namespace #1, feature #4) Label “”.
(example #5, namespace #1, feature #4) Using default value of “1” for feature.
(example #5, namespace #1, feature #5) Label “”.
(example #5, namespace #1, feature #5) Using default value of “1” for feature.
why it claims that i have 4 and 5 features respectively
行末的额外 space 符号被 http://hunch.net/~vw/validate.html 解释为额外特征。 (是的,示例中的最后一行有两个额外的 space。)请注意 validate.html 报告额外功能的空名称:
(example #4, namespace #1, feature #4) Label “”.
请注意 validate.html 在 JavaScript 中实现并且完全独立于 VW 本身的实现(在 C 中)。 VW 忽略尾随的 spaces。您可以使用以下方法对其进行测试:
$ vw -P 1 < sample.data
...
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 1.0000 0.0000 4
0.522042 0.044084 2 2.0 1.0000 0.7900 4
1.838150 4.470366 3 3.0 3.0000 0.8857 4
1.488676 0.440255 4 4.0 1.0000 1.6635 4
1.270585 0.398217 5 5.0 2.0000 1.3690 4
所以所有五个示例都被报告为具有 4 个特征(请参阅最后一列)。
为什么是四个?自动添加了一个额外的常量(拦截)功能。如果你不想,你可以使用 vw --noconstant
.