如何使用 sklearn 中的 GridSearchCV 设置自己的评分以进行回归?
How to set own scoring with GridSearchCV from sklearn for regression?
我曾经使用 GridSearchCV(...scoring="accuracy"...) 作为分类模型。现在我准备将 GridSearchCV 用于回归模型并使用自己的误差函数设置评分。
示例代码:
def rmse(predict, actual):
predict = np.array(predict)
actual = np.array(actual)
distance = predict - actual
square_distance = distance ** 2
mean_square_distance = square_distance.mean()
score = np.sqrt(mean_square_distance)
return score
rmse_score = make_scorer(rmse)
gsSVR = GridSearchCV(...scoring=rmse_score...)
gsSVR.fit(X_train,Y_train)
SVR_best = gsSVR.best_estimator_
print(gsSVR.best_score_)
不过我发现是这样设置的return参数设置的时候错误分数最高。结果,我得到了最差的参数集和分数。在这种情况下,我怎样才能得到最好的估计器和分数?
总结:
分类 -> GridSearchCV(scoring="accuracy") -> best_estimaror...best
回归 -> GridSearchCV(scroing=rmse_score) -> best_estimator...最差
从技术上讲,这是一种损失,越低越好。您可以在 make_scorer
:
中打开该选项
greater_is_better : boolean, default=True Whether score_func is a
score function (default), meaning high is good, or a loss function,
meaning low is good. In the latter case, the scorer object will
sign-flip the outcome of the score_func.
您还需要将输入的顺序从 rmse(predict, actual)
更改为 rmse(actual, predict)
,因为这就是 GridSearchCV 传递它们的顺序。所以最终得分手将是这样的:
def rmse(actual, predict):
...
...
return score
rmse_score = make_scorer(rmse, greater_is_better = False)
我曾经使用 GridSearchCV(...scoring="accuracy"...) 作为分类模型。现在我准备将 GridSearchCV 用于回归模型并使用自己的误差函数设置评分。
示例代码:
def rmse(predict, actual):
predict = np.array(predict)
actual = np.array(actual)
distance = predict - actual
square_distance = distance ** 2
mean_square_distance = square_distance.mean()
score = np.sqrt(mean_square_distance)
return score
rmse_score = make_scorer(rmse)
gsSVR = GridSearchCV(...scoring=rmse_score...)
gsSVR.fit(X_train,Y_train)
SVR_best = gsSVR.best_estimator_
print(gsSVR.best_score_)
不过我发现是这样设置的return参数设置的时候错误分数最高。结果,我得到了最差的参数集和分数。在这种情况下,我怎样才能得到最好的估计器和分数?
总结:
分类 -> GridSearchCV(scoring="accuracy") -> best_estimaror...best
回归 -> GridSearchCV(scroing=rmse_score) -> best_estimator...最差
从技术上讲,这是一种损失,越低越好。您可以在 make_scorer
:
greater_is_better : boolean, default=True Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.
您还需要将输入的顺序从 rmse(predict, actual)
更改为 rmse(actual, predict)
,因为这就是 GridSearchCV 传递它们的顺序。所以最终得分手将是这样的:
def rmse(actual, predict):
...
...
return score
rmse_score = make_scorer(rmse, greater_is_better = False)