Weka 决策树节点数太高
Weka decision tree node count too high
我正在尝试解释 Weka RandomTree 的字符串表示形式。训练集有 1000 条记录(实例)。查看字符串,叶子中的实例数似乎加起来为 1030。这怎么可能?我是否以某种方式误解了字符串?
请参阅下面的完整 运行 说明。
注意以下几点:
Total Number of Instances 1000
同时从叶子收集所有计数:
(10/0),(1/0),(354/0),(18/1),(37/0),(11/0),(9/4),(5/0),(7/3),(5/0),(20/0),(1/0),(2/0),(168/0),(1/0),(145/0),(61/3),(3/1),(5/0),(44/13),(8/0),(10/2),(63/0),(8/3),(4/0)
共计1030条。
谢谢!
这是完整的 运行 描述:
=== Run information ===
Scheme: weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -depth 5
Relation: test-data
Instances: 1000
Attributes: 5
feature1
feature2
feature3
feature4
class
Test mode: evaluate on training data
=== Classifier model (full training set) ===
RandomTree
==========
feature2 < -0.27
| feature2 < -0.61
| | feature3 < 1.09
| | | feature2 < -2.41
| | | | feature2 < -2.45 : 0 (10/0)
| | | | feature2 >= -2.45 : 1 (1/0)
| | | feature2 >= -2.41
| | | | feature2 < -0.7 : 0 (354/0)
| | | | feature2 >= -0.7 : 0 (18/1)
| | feature3 >= 1.09
| | | feature2 < -0.94 : 0 (37/0)
| | | feature2 >= -0.94
| | | | feature1 < -0.02 : 0 (11/0)
| | | | feature1 >= -0.02 : 0 (9/4)
| feature2 >= -0.61
| | feature3 < -0.34
| | | feature1 < 1.19 : 1 (5/0)
| | | feature1 >= 1.19
| | | | feature2 < -0.39 : 0 (7/3)
| | | | feature2 >= -0.39 : 0 (5/0)
| | feature3 >= -0.34
| | | feature2 < -0.32 : 0 (20/0)
| | | feature2 >= -0.32
| | | | feature2 < -0.3 : 1 (1/0)
| | | | feature2 >= -0.3 : 0 (2/0)
feature2 >= -0.27
| feature1 < 1.19
| | feature3 < -0.11 : 1 (168/0)
| | feature3 >= -0.11
| | | feature3 < -0.1 : 0 (1/0)
| | | feature3 >= -0.1
| | | | feature4 < 0.59 : 1 (145/0)
| | | | feature4 >= 0.59 : 1 (61/3)
| feature1 >= 1.19
| | feature2 < 0.82
| | | feature2 < -0.18
| | | | feature2 < -0.21 : 0 (3/1)
| | | | feature2 >= -0.21 : 0 (5/0)
| | | feature2 >= -0.18
| | | | feature1 < 2.28 : 1 (44/13)
| | | | feature1 >= 2.28 : 0 (8/0)
| | feature2 >= 0.82
| | | feature1 < 2.67
| | | | feature1 < 1.33 : 1 (10/2)
| | | | feature1 >= 1.33 : 1 (63/0)
| | | feature1 >= 2.67
| | | | feature1 < 2.97 : 0 (8/3)
| | | | feature1 >= 2.97 : 1 (4/0)
Size of the tree : 49
Max depth of tree: 5
Time taken to build model: 0.05 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.03 seconds
=== Summary ===
Correctly Classified Instances 970 97 %
Incorrectly Classified Instances 30 3 %
Kappa statistic 0.94
Mean absolute error 0.0421
Root mean squared error 0.145
Relative absolute error 8.4142 %
Root relative squared error 29.0073 %
Total Number of Instances 1000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.964 0.024 0.976 0.964 0.970 0.940 0.997 0.996 0
0.976 0.036 0.964 0.976 0.970 0.940 0.997 0.995 1
Weighted Avg. 0.970 0.030 0.970 0.970 0.970 0.940 0.997 0.996
=== Confusion Matrix ===
a b <-- classified as
486 18 | a = 0
12 484 | b = 1
您误解了括号中数字的含义。我认为您在节点处将其解释为 (Correct instances / Incorrect instances)
,但实际上它意味着 (Total instances / Incorrect instances)
.
在每个叶节点,括号中都有一对数字,例如第七片叶子说:
feature1 >= -0.02 : 0 (9/4)
这意味着原始实例中有 9 个到达了这个叶子。 4 表示 在到达此叶 的 9 个实例中,有 4 个被错误分类。如果将括号中的所有第一个数字相加,它们的总和为 1000。第二个数字的总和为 30。这与稍后在输出中给出的错误数相匹配:
Correctly Classified Instances 970 97 %
Incorrectly Classified Instances 30 3 %
注意错误数只有在使用
时才会一致
=== Evaluation on training set ===
像你一样。 cross-validation.
下的数字会有所不同
我正在尝试解释 Weka RandomTree 的字符串表示形式。训练集有 1000 条记录(实例)。查看字符串,叶子中的实例数似乎加起来为 1030。这怎么可能?我是否以某种方式误解了字符串?
请参阅下面的完整 运行 说明。
注意以下几点:
Total Number of Instances 1000
同时从叶子收集所有计数:
(10/0),(1/0),(354/0),(18/1),(37/0),(11/0),(9/4),(5/0),(7/3),(5/0),(20/0),(1/0),(2/0),(168/0),(1/0),(145/0),(61/3),(3/1),(5/0),(44/13),(8/0),(10/2),(63/0),(8/3),(4/0)
共计1030条。
谢谢!
这是完整的 运行 描述:
=== Run information ===
Scheme: weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -depth 5
Relation: test-data
Instances: 1000
Attributes: 5
feature1
feature2
feature3
feature4
class
Test mode: evaluate on training data
=== Classifier model (full training set) ===
RandomTree
==========
feature2 < -0.27
| feature2 < -0.61
| | feature3 < 1.09
| | | feature2 < -2.41
| | | | feature2 < -2.45 : 0 (10/0)
| | | | feature2 >= -2.45 : 1 (1/0)
| | | feature2 >= -2.41
| | | | feature2 < -0.7 : 0 (354/0)
| | | | feature2 >= -0.7 : 0 (18/1)
| | feature3 >= 1.09
| | | feature2 < -0.94 : 0 (37/0)
| | | feature2 >= -0.94
| | | | feature1 < -0.02 : 0 (11/0)
| | | | feature1 >= -0.02 : 0 (9/4)
| feature2 >= -0.61
| | feature3 < -0.34
| | | feature1 < 1.19 : 1 (5/0)
| | | feature1 >= 1.19
| | | | feature2 < -0.39 : 0 (7/3)
| | | | feature2 >= -0.39 : 0 (5/0)
| | feature3 >= -0.34
| | | feature2 < -0.32 : 0 (20/0)
| | | feature2 >= -0.32
| | | | feature2 < -0.3 : 1 (1/0)
| | | | feature2 >= -0.3 : 0 (2/0)
feature2 >= -0.27
| feature1 < 1.19
| | feature3 < -0.11 : 1 (168/0)
| | feature3 >= -0.11
| | | feature3 < -0.1 : 0 (1/0)
| | | feature3 >= -0.1
| | | | feature4 < 0.59 : 1 (145/0)
| | | | feature4 >= 0.59 : 1 (61/3)
| feature1 >= 1.19
| | feature2 < 0.82
| | | feature2 < -0.18
| | | | feature2 < -0.21 : 0 (3/1)
| | | | feature2 >= -0.21 : 0 (5/0)
| | | feature2 >= -0.18
| | | | feature1 < 2.28 : 1 (44/13)
| | | | feature1 >= 2.28 : 0 (8/0)
| | feature2 >= 0.82
| | | feature1 < 2.67
| | | | feature1 < 1.33 : 1 (10/2)
| | | | feature1 >= 1.33 : 1 (63/0)
| | | feature1 >= 2.67
| | | | feature1 < 2.97 : 0 (8/3)
| | | | feature1 >= 2.97 : 1 (4/0)
Size of the tree : 49
Max depth of tree: 5
Time taken to build model: 0.05 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.03 seconds
=== Summary ===
Correctly Classified Instances 970 97 %
Incorrectly Classified Instances 30 3 %
Kappa statistic 0.94
Mean absolute error 0.0421
Root mean squared error 0.145
Relative absolute error 8.4142 %
Root relative squared error 29.0073 %
Total Number of Instances 1000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.964 0.024 0.976 0.964 0.970 0.940 0.997 0.996 0
0.976 0.036 0.964 0.976 0.970 0.940 0.997 0.995 1
Weighted Avg. 0.970 0.030 0.970 0.970 0.970 0.940 0.997 0.996
=== Confusion Matrix ===
a b <-- classified as
486 18 | a = 0
12 484 | b = 1
您误解了括号中数字的含义。我认为您在节点处将其解释为 (Correct instances / Incorrect instances)
,但实际上它意味着 (Total instances / Incorrect instances)
.
在每个叶节点,括号中都有一对数字,例如第七片叶子说:
feature1 >= -0.02 : 0 (9/4)
这意味着原始实例中有 9 个到达了这个叶子。 4 表示 在到达此叶 的 9 个实例中,有 4 个被错误分类。如果将括号中的所有第一个数字相加,它们的总和为 1000。第二个数字的总和为 30。这与稍后在输出中给出的错误数相匹配:
Correctly Classified Instances 970 97 %
Incorrectly Classified Instances 30 3 %
注意错误数只有在使用
时才会一致=== Evaluation on training set ===
像你一样。 cross-validation.
下的数字会有所不同