Validation Set Approach

Validation Set

A validation set or holdout set is a subset of samples held out from the original training set for testing.

How is this different from the test set?

I guess the difference lies in the practicality and the purpose.

If a test set is an unknown final evaluation for the model, a validation set is more of an intermediate evaluation during the model development process.

We use the validation-set error to estimate the test error:

  • MSE for quantitative response
  • Misclassification rate for qualitative response

Validation Process

  1. Shuffle the data.
  2. Randomly split the data into two parts:
    • Training set
    • Validation set

    The ratio of the split is up to discretion.

Drawbacks

High Variability in Estimation

Depending on how the validation set is split, validation-set error can vary significantly, leading to unstable estimate of the test error.

We Lose Data for Training

Part of the data that could’ve been used for training is now used for validation, which can lower the accuracy of the model and overestimate the test error.