Regression

Table of contents

Regression Function

Let $Y$ be the response variable and $X$ be the predictor variable.

The regression function $r(x)$ is:

$$ r(x) = E(Y | X = x) = \int y f(y | x) dy $$

The goal of regression is to estimate $r(x)$ from the data $(X_1, Y_1), \dots, (X_n, Y_n)$.

Let $x^\ast$ be a new observation.

For a simple linear regression model, the prediction is:

$$ \hat{Y}^\ast = \hat{\beta}_0 + \hat{\beta}_1 x^\ast $$

Variance of $\hat{Y}^\ast$ is a variance of a sum of two estimators $\hat{\beta}_0$ and $\hat{\beta}_1$.

$$ \Var(\hat{Y}^\ast) = \Var(\hat{\beta}_0) + x^{\ast 2} \Var(\hat{\beta}_1) + 2 x^\ast \Cov(\hat{\beta}_0, \hat{\beta}_1) $$

In the prediction above, the error term is ommitted.

However, when we construct the prediction interval, we consider the variance of the error term as well:

\[\hat{\mathcal{E}}^2 = \Var(\hat{Y}^\ast) + \Var(\varepsilon) = \Var(\hat{Y}^\ast) + \sigma^2\]

The prediction interval is:

$$ \hat{Y}^\ast \pm z_{\alpha/2}\, \hat{\mathcal{E}} $$