AIC and BIC

Table of contents
  1. Akaike Information Criterion (AIC)
    1. When Error is Normally Distributed
  2. Bayesian Information Criterion (BIC)
    1. Special case

Akaike Information Criterion (AIC)

AIC assesses goodness of fit of a model based on maximum likelihood.

$$ \text{AIC} = -2 \log L^\ast + 2p $$

  • $L^\ast$: Maximized likelihood of the model
  • $p$: Number of parameters in the model

Lower AIC values indicate better models.

If likelihood is maximized, AIC is minimized. $2p$ is a penalty on complexity.

When Error is Normally Distributed

We know that OLS estimator is equivalent to MLE when the error is normally distributed.

Therefore, Mallows’ $C_p$ is almost equivalent to AIC when the error is normally distributed (they differ by some constant factors).


Bayesian Information Criterion (BIC)

BIC is similar to AIC but has a stronger penalty on complexity when the sample size is large.

$$ \text{BIC} = -2 \log L^\ast + p \log n $$

  • $L^\ast$: Maximized likelihood of the model
  • $p$: Number of parameters in the model

Lower BIC values indicate better models.

Special case

Similar to AIC, when the error is normally distributed, BIC is equivalent to:

$$ \text{BIC} = \frac{1}{n} \left( \text{RSS} + \log(n)p\sigma^2 \right) $$

Where $\sigma^2$ is the variance of the error term.

Notice that it is very similar to Mallows’ $C_p$.

But since it places a stronger penalty on complexity when $n$ is large, it tends to select simpler models than $C_p$.