基于概率的 GridSearchCV 超参数调整随机森林分类器
Hyperparameter tuning Random Forest Classifier with GridSearchCV based on probability
刚开始为随机森林二元分类调整超参数,我想知道是否有人 knew/could 建议如何将评分设置为基于预测概率而不是预测分类。理想情况下,我想要一些可以考虑 roc_auc 的概率(即 [0.2,0.6,0.7,0.1,0.0])而不是分类(即 [0,1,1,0,0]).
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV
rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)
param_grid = {
'n_estimators': [200,500],
'max_features': [.5,.7],
'bootstrap': [False, True],
'max_depth':[3,6]
}
rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
, scoring = 'roc_auc')
我认为目前 roc_auc 正在脱离实际分类。想在开始创建自定义评分函数之前检查是否有更有效的方法,在此先感谢您的帮助!
最终使用 Jarad 提供的参考求解:
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV
rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)
param_grid = {
'n_estimators': [200,500],
'max_features': [.5,.7],
'bootstrap': [False, True],
'max_depth':[3,6]
}
def roc_auc_scorer(y_true, y_pred):
return roc_auc_score(y_true, y_pred[:, 1])
scorer = make_scorer(roc_auc_scorer, needs_proba=True)
rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
, scoring = scorer)
刚开始为随机森林二元分类调整超参数,我想知道是否有人 knew/could 建议如何将评分设置为基于预测概率而不是预测分类。理想情况下,我想要一些可以考虑 roc_auc 的概率(即 [0.2,0.6,0.7,0.1,0.0])而不是分类(即 [0,1,1,0,0]).
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV
rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)
param_grid = {
'n_estimators': [200,500],
'max_features': [.5,.7],
'bootstrap': [False, True],
'max_depth':[3,6]
}
rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
, scoring = 'roc_auc')
我认为目前 roc_auc 正在脱离实际分类。想在开始创建自定义评分函数之前检查是否有更有效的方法,在此先感谢您的帮助!
最终使用 Jarad 提供的参考求解:
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV
rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)
param_grid = {
'n_estimators': [200,500],
'max_features': [.5,.7],
'bootstrap': [False, True],
'max_depth':[3,6]
}
def roc_auc_scorer(y_true, y_pred):
return roc_auc_score(y_true, y_pred[:, 1])
scorer = make_scorer(roc_auc_scorer, needs_proba=True)
rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
, scoring = scorer)