决策树重复 class 个名称

Question

我有一个非常简单的 data/labels 示例，我遇到的问题是生成的决策树 (pdf) 重复 class 名称：

from sklearn import tree
from sklearn.externals.six import StringIO  
import pydotplus

features_names = ['weight', 'texture']
features = [[140, 1], [130, 1], [150, 0], [110, 0]]
labels = ['apple', 'apple', 'orange', 'orange']

clf = tree.DecisionTreeClassifier()
clf.fit(features, labels)

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data, 
                         feature_names=features_names,  
                         class_names=labels,  
                         filled=True, rounded=True,  
                         special_characters=True,
                         impurity=False)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
graph.write_pdf("apples_oranges.pdf")

生成的 pdf 如下所示：

所以，问题很明显，这两种可能性都是苹果。我做错了什么？

来自DOCS：

list of strings, bool or None, optional (default=None)
Names of each of the target classes in ascending numerical order. Only relevant for classification and not supported for multi-output. If True, shows a symbolic representation of the class name.

“...升序数字顺序”这对我来说意义不大，如果我将 kwarg 更改为：

class_names=sorted(labels)

结果是一样的（在本例中很明显）。

Answer 1

class 名字就是 class 的名字。这不是每个示例的标签。

所以一个class是'apple'，另一个是'orange'，所以你只需要传入['apple', 'orange'].

关于顺序，要使其正确一致，您可以使用 LabelEncoder 将目标转换为整数 int_labels = labelEncoder.fit_transform(labels)，使用 int_labels 来适应您的决策树，然后使用labelEncoder.classes_ 属性传递到您的图表中，即

决策树重复 class 个名称

decision tree repeating class names

python

decision-tree

python-3.x

sklearn-pandas