Train, Validation and Test Sets
可以参考这篇文章中对这三个概念最通俗的解释
训练集Training Dataset
The sample of data used to fit the model.
- 用来训练模型的实际数据集(神经网络的权重和偏差),模型从这些数据中看到并学习
验证集Validation Dataset
The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
- 验证集用来评估给定的模型,在机器学习中我们用验证集来微调模型的超参数,因此模型只是偶尔能看到这些数据,但是不会从中学习,利用验证集结果来更新提高超参数,因此验证集是会间接的影响模型
测试集Test Dataset
The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset
- 测试数据集提供了用于评估模型的黄金标准,它在模型使用训练和验证集之后才使用