留一法随机森林模型的输出指标 TP、NP、TN、FN 值 python
output metrics TP, NP, TN, FN values for leave one out random forest model python
我是运行随机森林模型的留一法网格搜索。我使用 f1 分数来获得最佳估算器和分数。从这里开始,我如何才能获得准确率和召回率分数,以便绘制准确率-召回率曲线? X是样本数据集,y是目标。
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import LeaveOneOut
RF = RandomForestClassifier()
param_grid = {
'n_estimators': [10,20,30,50],
'criterion': ['gini', 'entropy'],
'max_depth': [10, 20, 30, None]}
grid_search = GridSearchCV(RF,
param_grid=param_grid,
cv = LeaveOneOut()
score='f1_score')
grid_search.fit(X, y)
您可以将模型的预测收集到一个数组中,并使用它来计算精确召回曲线(或您需要的任何其他性能指标)的数据:
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
# The code you provided would go here
# Use the train partition to train the model
grid_search.fit(Xtrain, ytrain)
# Use the test partition to test the model with unseen data
ypred = grid_search.predict(Xtest)
precision, recall, thresholds = precision_recall_curve(ytest, ypred)
plt.plot(recall, precision)
强烈建议您拆分数据集并使用其中的大部分来训练模型,并留下一些数据仅用于测试性能。这样,就可以检查用未见过的数据进行概括的能力。
我是运行随机森林模型的留一法网格搜索。我使用 f1 分数来获得最佳估算器和分数。从这里开始,我如何才能获得准确率和召回率分数,以便绘制准确率-召回率曲线? X是样本数据集,y是目标。
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import LeaveOneOut
RF = RandomForestClassifier()
param_grid = {
'n_estimators': [10,20,30,50],
'criterion': ['gini', 'entropy'],
'max_depth': [10, 20, 30, None]}
grid_search = GridSearchCV(RF,
param_grid=param_grid,
cv = LeaveOneOut()
score='f1_score')
grid_search.fit(X, y)
您可以将模型的预测收集到一个数组中,并使用它来计算精确召回曲线(或您需要的任何其他性能指标)的数据:
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
# The code you provided would go here
# Use the train partition to train the model
grid_search.fit(Xtrain, ytrain)
# Use the test partition to test the model with unseen data
ypred = grid_search.predict(Xtest)
precision, recall, thresholds = precision_recall_curve(ytest, ypred)
plt.plot(recall, precision)
强烈建议您拆分数据集并使用其中的大部分来训练模型,并留下一些数据仅用于测试性能。这样,就可以检查用未见过的数据进行概括的能力。