AUC 的网格搜索查找参数
Grid-Search finding Parameters for AUC
我正在尝试为我的 SVM 找到参数,这些参数给我最好的 AUC。但是我在 sklearn 中找不到 AUC 的任何评分函数。有人有想法吗?这是我的代码:
parameters = {"C":[0.1, 1, 10, 100, 1000], "gamma":[0.1, 0.01, 0.001, 0.0001, 0.00001]}
clf = SVC(kernel = "rbf")
clf = GridSearchCV(clf, parameters, scoring = ???)
svr.fit(features_train , labels_train)
print svr.best_params_
那我可以用来做什么???获得高 AUC 分数的最佳参数?
我没试过这个,但我相信你想使用 sklearn.metrics.roc_auc_score
。
问题是它不是模型记分器,因此您需要构建一个。
类似于:
from sklearn.metrics import roc_auc_score
def score_auc(estimator, X, y):
y_score = estimator.predict_proba(X) # You could also use the binary predict, but probabilities should give you a more realistic score.
return roc_auc_score(y, y_score)
并将此函数用作 GridSearch 中的评分参数。
您可以简单地使用:
clf = GridSearchCV(clf, parameters, scoring='roc_auc')
你可以自己创造任何得分手:
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_curve, auc
# define scoring function
def custom_auc(ground_truth, predictions):
# I need only one column of predictions["0" and "1"]. You can get an error here
# while trying to return both columns at once
fpr, tpr, _ = roc_curve(ground_truth, predictions[:, 1], pos_label=1)
return auc(fpr, tpr)
# to be standart sklearn's scorer
my_auc = make_scorer(custom_auc, greater_is_better=True, needs_proba=True)
pipeline = Pipeline(
[("transformer", TruncatedSVD(n_components=70)),
("classifier", xgb.XGBClassifier(scale_pos_weight=1.0, learning_rate=0.1,
max_depth=5, n_estimators=50, min_child_weight=5))])
parameters_grid = {'transformer__n_components': [60, 40, 20] }
grid_cv = GridSearchCV(pipeline, parameters_grid, scoring = my_auc, n_jobs=-1,
cv = StratifiedShuffleSplit(n_splits=5,test_size=0.3,random_state = 0))
grid_cv.fit(X, y)
更多信息,请查看这里:sklearn make_scorer
使用下面的代码,它会给你所有的参数列表
import sklearn
sklearn.metrics.SCORERS.keys()
Select 您要使用的适当参数
在您的情况下,下面的代码将起作用
clf = GridSearchCV(clf, parameters, scoring = 'roc_auc')
我正在尝试为我的 SVM 找到参数,这些参数给我最好的 AUC。但是我在 sklearn 中找不到 AUC 的任何评分函数。有人有想法吗?这是我的代码:
parameters = {"C":[0.1, 1, 10, 100, 1000], "gamma":[0.1, 0.01, 0.001, 0.0001, 0.00001]}
clf = SVC(kernel = "rbf")
clf = GridSearchCV(clf, parameters, scoring = ???)
svr.fit(features_train , labels_train)
print svr.best_params_
那我可以用来做什么???获得高 AUC 分数的最佳参数?
我没试过这个,但我相信你想使用 sklearn.metrics.roc_auc_score
。
问题是它不是模型记分器,因此您需要构建一个。 类似于:
from sklearn.metrics import roc_auc_score
def score_auc(estimator, X, y):
y_score = estimator.predict_proba(X) # You could also use the binary predict, but probabilities should give you a more realistic score.
return roc_auc_score(y, y_score)
并将此函数用作 GridSearch 中的评分参数。
您可以简单地使用:
clf = GridSearchCV(clf, parameters, scoring='roc_auc')
你可以自己创造任何得分手:
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_curve, auc
# define scoring function
def custom_auc(ground_truth, predictions):
# I need only one column of predictions["0" and "1"]. You can get an error here
# while trying to return both columns at once
fpr, tpr, _ = roc_curve(ground_truth, predictions[:, 1], pos_label=1)
return auc(fpr, tpr)
# to be standart sklearn's scorer
my_auc = make_scorer(custom_auc, greater_is_better=True, needs_proba=True)
pipeline = Pipeline(
[("transformer", TruncatedSVD(n_components=70)),
("classifier", xgb.XGBClassifier(scale_pos_weight=1.0, learning_rate=0.1,
max_depth=5, n_estimators=50, min_child_weight=5))])
parameters_grid = {'transformer__n_components': [60, 40, 20] }
grid_cv = GridSearchCV(pipeline, parameters_grid, scoring = my_auc, n_jobs=-1,
cv = StratifiedShuffleSplit(n_splits=5,test_size=0.3,random_state = 0))
grid_cv.fit(X, y)
更多信息,请查看这里:sklearn make_scorer
使用下面的代码,它会给你所有的参数列表
import sklearn
sklearn.metrics.SCORERS.keys()
Select 您要使用的适当参数
在您的情况下,下面的代码将起作用
clf = GridSearchCV(clf, parameters, scoring = 'roc_auc')