neg_mean_squared_error的解读

Question

我对 sklearnmetrics 的 neg_mean_squared_error 有点怀疑。我正在使用带有交叉验证的回归模型 Ridge

cross_val_score(estimator, X_train, y_train, cv=5, scoring='neg_mean_squared_error')

我使用不同的 alpha 值来选择最佳模型。

alphas= (0.01, 0.05, 0.1, 0.3, 0.8, 1, 5, 10, 15, 30, 50)

我计算了 cross_val_score 返回的 5 个值的平均值，并将它们绘制在这张图中（分数的平均值是 y 轴，alphas 是 x 轴）

做一些研究我发现 neg_mean_squared_error，我们需要寻找 'the smaller the better' 这是否意味着我必须“逐字地”寻找最小值，这将是我图表中的第一个值，或者它是否意味着 'closest to 0'

中的最小值

在我的案例中，所有值都是负数，这就是为什么我对解释有疑问

非常感谢

Answer 1

来自docs

All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.

所以你想要的是你的值的最大值，即最接近 0。

Answer 2

Scikit-learn 按照惯例认为分数遵循以下规则：'higher values are better than lower values'。在这种情况下，一个小的 MSE 表明您的预测接近数据，因此它遵循相反的规则。这就是为什么 sklearn 将负（实际上相反）MSE 视为分数的原因。因此，大的 neg_mean_squared_error 比小的好。它也与您的图表一致，因为参数的极值通常会降低模型的性能。

来自 Scikit-learn 网站的屏幕准确地指出了以下内容：

neg_mean_squared_error的解读

interpretation of neg_mean_squared_error

python

regression

scikit-learn