post-在评分前处理交叉验证的预测

Question

我有一个回归问题，我正在交叉验证结果并评估性能。我事先知道基本事实不能小于零。因此，我想在将预测输入分数指标之前拦截这些预测，以将预测裁剪为零。我认为使用 make_scorer 函数会很有用。是否有可能以某种方式post-在交叉验证之后但在对其应用评估指标之前处理预测？

from sklearn.metrics import mean_squared_error, r2_score, make_scorer
from sklearn.model_selection import cross_validate

# X = Stacked feature vectors
# y = ground truth vector
# regr = some regression estimator

#### How to indicate that the predictions need post-processing 
#### before applying the score function???
scoring = {'r2': make_scorer(r2_score),
           'neg_mse': make_scorer(mean_squared_error)}

scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

PS：我知道有约束估计器，但我想看看像这样的启发式方法会如何执行。

Answer 1

您可以做的一件事是按照您的建议使用 make_scorer() 将您要使用的那些记分器（r2_score、mean_squared_error）包装在自定义记分器函数中。

查看 this part of the sklearn documentation and 中的一些示例。特别是，您的函数可以执行以下操作：

def clipped_r2(y_true, y_pred):
    y_pred_clipped = np.clip(y_pred, 0, None)
    return r2_score(y_true, y_pred_clipped)

def clipped_mse(y_true, y_pred):
    y_pred_clipped = (y_pred, 0, None)
    return mean_squared_error(y_true, y_pred_clipped)

这允许您在调用评分函数（在本例中为 r2_score 或 mean_squared_error）之前在评分器中进行 post 处理。然后使用它就像你在上面做的那样使用 make_scorer，根据评分器是评分函数（如 r2，越大越好）或损失函数（mean_squared_error 设置 greater_is_better当它为 0 时更好，即更少）：

scoring = {'r2': make_scorer(clipped_r2, greater_is_better=True),
           'neg_mse': make_scorer(clipped_mse, greater_is_better=False)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

post-在评分前处理交叉验证的预测

post-process cross-validated prediction before scoring

python

scikit-learn

cross-validation