训练和测试误差之间有多少差异被认为是合适的?

How much difference between training and test error is considered suitable?

我正在研究回归问题,我使用带有决策树的 ad-boost 进行回归,并使用 r^2 作为评估指标。我想知道训练 r^2 和测试 r^2 之间有多少差异被认为是合适的。我的训练 r^2 是 0.9438,测试 r^2 是 0.877。是过拟合还是好?我只想知道确切训练和测试之间有多少差异是可以接受的 or 合适?.

你的问题有几个问题。

首先,肯定推荐将r^2作为预测问题的性能评估指标;引用我自己在 :

中的回答

the whole R-squared concept comes in fact directly from the world of statistics, where the emphasis is on interpretative models, and it has little use in machine learning contexts, where the emphasis is clearly on predictive models; at least AFAIK, and beyond some very introductory courses, I have never (I mean never...) seen a predictive modeling problem where the R-squared is used for any kind of performance assessment; neither it's an accident that popular machine learning introductions, such as Andrew Ng's Machine Learning at Coursera, do not even bother to mention it. And, as noted in the Github thread above (emphasis added):

In particular when using a test set, it's a bit unclear to me what the R^2 means.

我当然同意。

第二个:

I have training r^2 is 0.9438 and testing r^2 is 0.877. Is it over-fitting or good?

训练和测试分数本身的差异并不表示过度拟合。这只是泛化差距,即训练集和验证集之间的预期性能差距;引用最近 blog post by Google AI:

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

过度拟合的迹象特征是当你的验证损失开始增加,而你的训练损失继续减少时,即:

(图片改编自 Wikipedia entry on overfitting - 不同的东西可能位于水平轴上,例如这里是提升树的数量)

I just want to know exactly how much difference between training and testing is acceptable or suitable?

这个问题没有统一的答案;一切都取决于您的数据细节和您要解决的业务问题。