Resampling Methods
Two common resampling methods:
- Cross-validation: to estimate the test error of a model.
- Bootstrap: to estimate the uncertainty of an estimator.
Resampling methods tend to be computationally expensive (typically not an issue these days), but they are very useful in practice.
Table of contents
Cross-Validation
Bootstrap
Short Summary:
This method is most commonly used to estimate the uncertainty of an estimator (e.g. standard error of an estimator).
Unlike cross-validation, we sample with replacement to create multiple datasets, each dataset possibly containing the same data point multiple times.
With the diagram above,
- We have an original dataset
Bootstrap samples
where are is usually a large number- Sampled with replacement (duplicates exist in each sample)
- Same size as the original dataset
- Bootstrap estimates
are calculated from each sample
Then we can estimate the standard error of the estimator