AIC and BIC
Table of contents
Akaike Information Criterion (AIC)
AIC assesses goodness of fit of a model based on maximum likelihood.
$$ \text{AIC} = -2 \log L^\ast + 2p $$
- $L^\ast$: Maximized likelihood of the model
- $p$: Number of parameters in the model
Lower AIC values indicate better models.
If likelihood is maximized, AIC is minimized. $2p$ is a penalty on complexity.
When Error is Normally Distributed
We know that OLS estimator is equivalent to MLE when the error is normally distributed.
Therefore, Mallows’ $C_p$ is almost equivalent to AIC when the error is normally distributed (they differ by some constant factors).
Bayesian Information Criterion (BIC)
BIC is similar to AIC but has a stronger penalty on complexity when the sample size is large.
$$ \text{BIC} = -2 \log L^\ast + p \log n $$
- $L^\ast$: Maximized likelihood of the model
- $p$: Number of parameters in the model
Lower BIC values indicate better models.
Special case
Similar to AIC, when the error is normally distributed, BIC is equivalent to:
$$ \text{BIC} = \frac{1}{n} \left( \text{RSS} + \log(n)p\sigma^2 \right) $$
Where $\sigma^2$ is the variance of the error term.
Notice that it is very similar to Mallows’ $C_p$.
But since it places a stronger penalty on complexity when $n$ is large, it tends to select simpler models than $C_p$.