sklearn中K-Fold交叉验证中每个折叠的预测值
Predicted values of each fold in K-Fold Cross Validation in sklearn
我已经使用 python sklearn,
对我拥有的数据集执行了 10 折交叉验证
result = cross_val_score(best_svr, X, y, cv=10, scoring='r2')
print(result.mean())
我已经能够得到r2分数的平均值作为最终结果。我想知道是否有办法打印出每次折叠的预测值(在本例中为 10 组值)。
我相信您正在寻找 cross_val_predict
函数。
要打印每个折叠的预测,
for k in range(2,10):
result = cross_val_score(best_svr, X, y, cv=k, scoring='r2')
print(k, result.mean())
y_pred = cross_val_predict(best_svr, X, y, cv=k)
print(y_pred)
一个迟到的答案,只是添加到@jh314,cross_val_predict
做了return所有的预测,但我们不知道每个预测属于哪个折叠。为此,我们需要提供折叠数,而不是整数:
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict, StratifiedKFold
iris = sns.load_dataset('iris')
X=iris.iloc[:,:4]
y=(iris['species'] == "versicolor").astype('int')
rfc = RandomForestClassifier()
skf = StratifiedKFold(n_splits=10,random_state=111,shuffle=True)
pred = cross_val_predict(rfc, X, y, cv=skf)
现在我们遍历 Kfold 对象并提取与每个折叠对应的预测:
fold_pred = [pred[j] for i, j in skf.split(X,y)]
fold_pred
[array([0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0]),
array([0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1]),
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0]),
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0])]
我已经使用 python sklearn,
对我拥有的数据集执行了 10 折交叉验证result = cross_val_score(best_svr, X, y, cv=10, scoring='r2')
print(result.mean())
我已经能够得到r2分数的平均值作为最终结果。我想知道是否有办法打印出每次折叠的预测值(在本例中为 10 组值)。
我相信您正在寻找 cross_val_predict
函数。
要打印每个折叠的预测,
for k in range(2,10):
result = cross_val_score(best_svr, X, y, cv=k, scoring='r2')
print(k, result.mean())
y_pred = cross_val_predict(best_svr, X, y, cv=k)
print(y_pred)
一个迟到的答案,只是添加到@jh314,cross_val_predict
做了return所有的预测,但我们不知道每个预测属于哪个折叠。为此,我们需要提供折叠数,而不是整数:
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict, StratifiedKFold
iris = sns.load_dataset('iris')
X=iris.iloc[:,:4]
y=(iris['species'] == "versicolor").astype('int')
rfc = RandomForestClassifier()
skf = StratifiedKFold(n_splits=10,random_state=111,shuffle=True)
pred = cross_val_predict(rfc, X, y, cv=skf)
现在我们遍历 Kfold 对象并提取与每个折叠对应的预测:
fold_pred = [pred[j] for i, j in skf.split(X,y)]
fold_pred
[array([0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0]),
array([0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1]),
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0]),
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0])]