Regression

Table of contents

Regression Function

Regression Function

We define models as functions that map inputs to outputs:

\[Y = f(X) + \epsilon\]

An observation is often a pair of response and predictor:

\[(X, Y)\]

For given $X = x$, the realization of $Y$ may not be unique (people with same weight can have different heights).

Then what should our model $f$ spit out? We’ll say ideally:

$$ f(x) = \E[Y | X=x] $$

A regression function $f(x) = \E[Y | X=x]$ is one that minimizes $\E[(Y - f(X))^2 | X=x]$ (MSE)

Why do we use mean squared error?

There are few reasons to why squared error is beneficial:

It is differentiable everywhere
It explodes penalties for large errors
It handles both negative and positive errors

We do not know the true model $f$. Therefore, we use an estimate $\hat{f}$:

$$ \begin{align*} \E[(Y - \hat{f}(X))^2 | X=x] &= \E[(f(X) + \epsilon - \hat{f}(X))^2 | X=x] \\[1em] &= [f(x) - \hat{f}(x)]^2 + \Var(\epsilon) \end{align*} $$

Derivation

First remember that $\E[\epsilon] = 0$ and that $\epsilon$ is independent of $X$.

\[\begin{align*} &\E[(f(X) + \epsilon - \hat{f}(X))^2 | X=x] \\[0.5em] =\; &\E[(f(X) - \hat{f}(X))^2 + 2\epsilon(f(X) - \hat{f}(X)) + \epsilon^2 | X=x] \\[0.5em] =\; &(f(x) - \hat{f}(x))^2 + 2(f(x) - \hat{f}(x))\E[\epsilon | X=x] + \E[\epsilon^2 | X=x] \end{align*}\]

We know that $\E[\epsilon | X = x] = \E[\epsilon] = 0$ and:

\[\E[\epsilon^2 | X = x] = \E[\epsilon^2] = \E[\epsilon^2] - \E[\epsilon]^2 = \Var(\epsilon)\]

The first part:

\[[f(x) - \hat{f}(x)]^2\]

is called the reducible error and the second part:

\[\Var(\epsilon)\]

is, of course, the irreducible error.

So the goal of regression is to estimate $f$ while minimizing the reducible error.