Resampling Methods

Two common resampling methods:

  • Cross-validation: to estimate the test error of a model.
  • Bootstrap: to estimate the uncertainty of an estimator.

Resampling methods tend to be computationally expensive (typically not an issue these days), but they are very useful in practice.

Table of contents
  1. Cross-Validation
  2. Bootstrap

Cross-Validation

See here


Bootstrap

See details here

Short Summary:

This method is most commonly used to estimate the uncertainty of an estimator (e.g. standard error of an estimator).

Unlike cross-validation, we sample with replacement to create multiple datasets, each dataset possibly containing the same data point multiple times.

Bootstrap Sampling

With the diagram above,

  1. We have an original dataset $|Z| = n$
  2. Bootstrap samples $Z^{\ast r}$ where $r \in \{1, \dots, B\}$ are

    $B$ is usually a large number

    • Sampled with replacement (duplicates exist in each sample)
    • Same size as the original dataset $n$
  3. Bootstrap estimates $\hat{\alpha}^{\ast r}$ are calculated from each sample

Then we can estimate the standard error of the estimator $\hat{\alpha}$:

$$ \text{SE}(\hat{\alpha}) = \sqrt{ \frac{1}{B-1} \sum_{r=1}^B \left( \hat{\alpha}^{\ast r} - \frac{1}{B} \sum_{r'=1}^B \hat{\alpha}^{\ast r'} \right)^2 } $$