80-20 或 80-10-10 用于训练机器学习模型?
80-20 or 80-10-10 for training machine learning models?
我有一个非常基本的问题。
1) 什么时候建议保留部分数据进行验证,什么时候不需要?例如,我们什么时候可以说 80% 的训练、10% 的验证和 10% 的测试拆分更好,什么时候我们可以说简单的 80% 培训和 20% 测试拆分就足够了?
2) 另外,使用 K-Cross Validation 是否与简单拆分(训练-测试)一起使用?
如果我的数据集大小有限,我发现训练和验证集更有价值。无论如何,验证集本质上是一个测试集。这样做的原因是您希望您的模型能够从对其训练的数据具有高精度进行推断,并且对以前从未见过的数据也具有高精度。验证集允许您确定是否是这种情况。我通常至少取 10% 的数据集并将其作为验证集。随机 select 验证数据很重要,这样它的概率分布与训练集的概率分布相匹配。接下来我监控验证损失并保存验证损失最低的模型。我还使用可调节的学习率。 Keras 为此目的提供了两个有用的回调,ModelCheckpoint 和 ReduceLROnPlateau。文档是 here. With a validation set you can monitor the validation loss during training and ascertain if your model is training proberly (training accuracy) and if it is extrapolating properly ( validation loss). The validation loss on average should decrease as the model accuracy increases. If the validation loss starts to increase with high training accuracy your model is over fitting and you can take remedial action such as including dropout layers, regularizers or reduce your model complexity. Documentation for that is here and here. To see why I use an adjustable learning rate see the answer to a stack overflow question here.
我有一个非常基本的问题。
1) 什么时候建议保留部分数据进行验证,什么时候不需要?例如,我们什么时候可以说 80% 的训练、10% 的验证和 10% 的测试拆分更好,什么时候我们可以说简单的 80% 培训和 20% 测试拆分就足够了?
2) 另外,使用 K-Cross Validation 是否与简单拆分(训练-测试)一起使用?
如果我的数据集大小有限,我发现训练和验证集更有价值。无论如何,验证集本质上是一个测试集。这样做的原因是您希望您的模型能够从对其训练的数据具有高精度进行推断,并且对以前从未见过的数据也具有高精度。验证集允许您确定是否是这种情况。我通常至少取 10% 的数据集并将其作为验证集。随机 select 验证数据很重要,这样它的概率分布与训练集的概率分布相匹配。接下来我监控验证损失并保存验证损失最低的模型。我还使用可调节的学习率。 Keras 为此目的提供了两个有用的回调,ModelCheckpoint 和 ReduceLROnPlateau。文档是 here. With a validation set you can monitor the validation loss during training and ascertain if your model is training proberly (training accuracy) and if it is extrapolating properly ( validation loss). The validation loss on average should decrease as the model accuracy increases. If the validation loss starts to increase with high training accuracy your model is over fitting and you can take remedial action such as including dropout layers, regularizers or reduce your model complexity. Documentation for that is here and here. To see why I use an adjustable learning rate see the answer to a stack overflow question here.