AIC and BIC

Table of contents

Akaike Information Criterion (AIC)

AIC assesses goodness of fit of a model based on maximum likelihood.

$$ \text{AIC} = -2 \log L^\ast + 2p $$

Lower AIC values indicate better models.

If likelihood is maximized, AIC is minimized. $2p$ is a penalty on complexity.

We know that OLS estimator is equivalent to MLE when the error is normally distributed.

Therefore, Mallows’ $C_p$ is almost equivalent to AIC when the error is normally distributed (they differ by some constant factors).

BIC is similar to AIC but has a stronger penalty on complexity when the sample size is large.

$$ \text{BIC} = -2 \log L^\ast + p \log n $$

Lower BIC values indicate better models.

Similar to AIC, when the error is normally distributed, BIC is equivalent to:

$$ \text{BIC} = \frac{1}{n} \left( \text{RSS} + \log(n)p\sigma^2 \right) $$

Where $\sigma^2$ is the variance of the error term.

Notice that it is very similar to Mallows’ $C_p$.

But since it places a stronger penalty on complexity when $n$ is large, it tends to select simpler models than $C_p$.