如何从 cross_val_predict 中获取类标签与 scikit-learn 中的 predict_proba 一起使用

Question

我需要使用 3 折交叉验证训练 Random Forest classifier。对于每个样本，我需要检索它恰好在测试集中时的预测概率。

我正在使用 scikit-learn 版本 0.18.dev0。

此新版本添加了使用方法 cross_val_predict() 的功能，该方法带有附加参数 method 来定义估计器需要哪种预测。

在我的例子中，我想使用 predict_proba() 方法，在多 class 场景中 returns 每个 class 的概率。

但是，当我运行该方法时，我得到了预测概率矩阵，其中每一行代表一个样本，每一列代表特定 class 的预测概率.

问题是该方法没有指出每个列对应于哪个class。

我需要的值与属性 classes_ 中返回的值相同（在我的例子中使用 RandomForestClassifier）定义为：

classes_ : array of shape = [n_classes] or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

predict_proba() 需要它，因为在其文档中写道：

The order of the classes corresponds to that in the attribute classes_.

一个最小的例子如下：

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict

clf = RandomForestClassifier()

X = np.random.randn(10, 10)
y = y = np.array([1] * 4 + [0] * 3 + [2] * 3)

# how to get classes from here?
proba = cross_val_predict(estimator=clf, X=X, y=y, method="predict_proba")

# using the classifier without cross-validation
# it is possible to get the classes in this way:
clf.fit(X, y)
proba = clf.predict_proba(X)
classes = clf.classes_

Answer 1

是的，它们将按排序顺序排列；这是因为 DecisionTreeClassifier（RandomForestClassifier 的默认 base_estimator）uses np.unique to construct the classes_ attribute 其中 returns 输入数组的排序唯一值。

如何从 cross_val_predict 中获取类标签与 scikit-learn 中的 predict_proba 一起使用

How to get classes labels from cross_val_predict used with predict_proba in scikit-learn

python

scikit-learn

cross-validation

如何从 cross_val_predict 中获取 类 标签与 scikit-learn 中的 predict_proba 一起使用

How to get classes labels from cross_val_predict used with predict_proba in scikit-learn

python

scikit-learn

cross-validation

如何从 cross_val_predict 中获取类标签与 scikit-learn 中的 predict_proba 一起使用