Resampling Methods

Two common resampling methods:

Cross-validation: to estimate the test error of a model.
Bootstrap: to estimate the uncertainty of an estimator.

Resampling methods tend to be computationally expensive (typically not an issue these days), but they are very useful in practice.

Table of contents

Cross-Validation
Bootstrap

Cross-Validation

See here

Bootstrap

See details here

Short Summary:

This method is most commonly used to estimate the uncertainty of an estimator (e.g. standard error of an estimator).

Unlike cross-validation, we sample with replacement to create multiple datasets, each dataset possibly containing the same data point multiple times.

With the diagram above,

We have an original dataset $|Z| = n$
Bootstrap samples $Z^{\ast r}$ where $r \in \{1, \dots, B\}$ are
$B$ is usually a large number
- Sampled with replacement (duplicates exist in each sample)
- Same size as the original dataset $n$
Bootstrap estimates $\hat{\alpha}^{\ast r}$ are calculated from each sample

Then we can estimate the standard error of the estimator $\hat{\alpha}$:

$$ \text{SE}(\hat{\alpha}) = \sqrt{ \frac{1}{B-1} \sum_{r=1}^B \left( \hat{\alpha}^{\ast r} - \frac{1}{B} \sum_{r'=1}^B \hat{\alpha}^{\ast r'} \right)^2 } $$