只排了一个属性，却选了两个？ weka 中的 InfoGain Ranker

Question

我已经运行对我的数据集进行了 InfoGain 评估，Ranker 的阈值为 0.1。

我通过 GUI 的输出显示：

Search Method:
    Attribute ranking.
    Threshold for discarding attributes:   0.1   

Attribute Evaluator (supervised, Class (nominal): 23 class):
    Information Gain Ranking Filter

Ranked attributes:
 0.141    2 nr_visits

Selected attributes: 2 : 1

在我的 java 实现中，我做了同样的事情：

Ranker ranker = new Ranker();
ranker.setGenerateRanking(true);
ranker.setThreshold(0.1);

AttributeSelection attsel = new AttributeSelection();
InfoGainAttributeEval eval = new InfoGainAttributeEval();

attsel.setEvaluator(eval);
attsel.setSearch(ranker);

attsel.SelectAttributes(instances);

int[] ranked_attr = attsel.selectedAttributes();
double[][] rawscores = attsel.rankedAttributes();

我在哪里得到类似的输出：

我的 ranked_attr 是 [1, 21]（1 是 nr_visits 特征，21 是另一个）
我的 rawscores 双精度数组不包含 21 的任何条目。它具有 1，然后是另一个得分低于我的阈值的特征。

什么给了？是否有一个或两个选定的功能？这是 weka 3.8.4 中的错误吗？

Answer 1

感谢邮件列表中的 Eibe：

AFAIK, the set of indices returned by selectedAttributes() includes the index of the class attribute. I assume that attribute 22 in your data is the class attribute. There is no score for the class attribute because it is the attribute that we are trying to predict.

因为是的，21 确实是我的 class 索引，它在代码中是从零开始的，在 GUI 中是从一开始的，这就是为什么我没有立即注意到的原因。

只排了一个属性，却选了两个？ weka 中的 InfoGain Ranker

Only one ranked attribute, but selected two? InfoGain Ranker in weka

weka