Regression

Table of contents
  1. Regression Function

Regression Function

We define models as functions that map inputs to outputs:

Y=f(X)+ϵ

An observation is often a pair of response and predictor:

(X,Y)

For given X=x, the realization of Y may not be unique (people with same weight can have different heights).

Then what should our model f spit out? We’ll say ideally:

f(x)=E[Y|X=x]

A regression function f(x)=E[Y|X=x] is one that minimizes E[(Yf(X))2|X=x] (MSE)

Why do we use mean squared error?

There are few reasons to why squared error is beneficial:

  1. It is differentiable everywhere
  2. It explodes penalties for large errors
  3. It handles both negative and positive errors

We do not know the true model f. Therefore, we use an estimate f^:

E[(Yf^(X))2|X=x]=E[(f(X)+ϵf^(X))2|X=x]=[f(x)f^(x)]2+Var(ϵ)

Derivation

First remember that E[ϵ]=0 and that ϵ is independent of X.

E[(f(X)+ϵf^(X))2|X=x]=E[(f(X)f^(X))2+2ϵ(f(X)f^(X))+ϵ2|X=x]=(f(x)f^(x))2+2(f(x)f^(x))E[ϵ|X=x]+E[ϵ2|X=x]

We know that E[ϵ|X=x]=E[ϵ]=0 and:

E[ϵ2|X=x]=E[ϵ2]=E[ϵ2]E[ϵ]2=Var(ϵ)

The first part:

[f(x)f^(x)]2

is called the reducible error and the second part:

Var(ϵ)

is, of course, the irreducible error.

So the goal of regression is to estimate f while minimizing the reducible error.