Validation Set Approach
Validation Set
A validation set or holdout set is a subset of samples held out from the original training set for testing.
How is this different from the test set?
I guess the difference lies in the practicality and the purpose.
If a test set is an unknown final evaluation for the model, a validation set is more of an intermediate evaluation during the model development process.
We use the validation-set error to estimate the test error:
- MSE for quantitative response
- Misclassification rate for qualitative response
Validation Process
- Shuffle the data.
- Randomly split the data into two parts:
- Training set
- Validation set
The ratio of the split is up to discretion.
Drawbacks
High Variability in Estimation
Depending on how the validation set is split, validation-set error can vary significantly, leading to unstable estimate of the test error.
We Lose Data for Training
Part of the data that could’ve been used for training is now used for validation, which can lower the accuracy of the model and overestimate the test error.