留一法随机森林模型的输出指标 TP、NP、TN、FN 值 python

Question

我是运行随机森林模型的留一法网格搜索。我使用 f1 分数来获得最佳估算器和分数。从这里开始，我如何才能获得准确率和召回率分数，以便绘制准确率-召回率曲线？ X是样本数据集，y是目标。

from sklearn.ensemble import  RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import LeaveOneOut

RF = RandomForestClassifier()
param_grid = { 
          'n_estimators': [10,20,30,50],
          'criterion': ['gini', 'entropy'],
          'max_depth': [10, 20, 30, None]}

grid_search = GridSearchCV(RF, 
                       param_grid=param_grid, 
                       cv = LeaveOneOut()
                       score='f1_score')

grid_search.fit(X, y)

Answer 1

您可以将模型的预测收集到一个数组中，并使用它来计算精确召回曲线（或您需要的任何其他性能指标）的数据：

from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt

# The code you provided would go here
# Use the train partition to train the model
grid_search.fit(Xtrain, ytrain)

# Use the test partition to test the model with unseen data
ypred = grid_search.predict(Xtest)
precision, recall, thresholds = precision_recall_curve(ytest, ypred)
plt.plot(recall, precision)

强烈建议您拆分数据集并使用其中的大部分来训练模型，并留下一些数据仅用于测试性能。这样，就可以检查用未见过的数据进行概括的能力。

留一法随机森林模型的输出指标 TP、NP、TN、FN 值 python

output metrics TP, NP, TN, FN values for leave one out random forest model python

python

metrics

random-forest

cross-validation

leave-one-out