R-Squared

Table of contents
  1. Coefficient of Determination R2
    1. Explained Variation
    2. Interpretation
    3. Relationship with Correlation
  2. Adjusted R2
    1. The issue with regular R2
    2. Adjustment

Coefficient of Determination R2

Coefficient of determination (R2) is a statistical measure of how well the model captures the variation in the dependent variable.

So essentially, goodness of fit.

To understand R2, we need to refer back to sum of squares.

Explained Variation

In cases like linear regression with OLS, we have seen that the following decomposition is true:

SStot=SSexp+SSres

Since we want to know how much of the variation is explained by the model, we solve for the proportion of the explained variation among the total variation.

SSexpSStot=SStotSSresSStot=1SSresSStot

This is the definition of R2.

R2=1SSresSStot

Interpretation

If the model captures all the variation in the dependent variable, the variation caused by error (SSres) is zero, which would make R2 equal to 1. This indicates that the model perfectly fits the data.

The baseline model, which is just the mean of the dependent variable, results in R2 equal to 0.

Any model that performs worse than the baseline model will have a negative R2.

R2=1perfect fitR2=0baseline modelR2<0worse than baseline model

So generally, the higher the R2 the better the model fits the data.

Anything below 0 means you should really reconsider your model or check if you have a mistake, because you’re doing worse than the bare minimum which is just always predicting the mean.

Just because a model has a high R2 doesn’t mean it’s a good model.

R2 is not a good measure for non-linear models, because the sum of squares decomposition doesn’t hold for them.

Relationship with Correlation

Review correlation from here.

The Pearson’s correlation coefficient r is basically the covariance of X and Y fit into a scale of [1,1].

For simple linear regression models, R2=r2.


Adjusted R2

The issue with regular R2

When you add more predictors/features to your model, R2 will always increase.

This is because as your model gets more complex, the SStot stays the same,

Remember SStot has nothing to do with the model, but only the data.

but SSres can only decrease (to be more precise, it does not increase).

Intuition

Think of what it means to increase the complexity of the model.

You had just a rigid line to fit your model before, but now you’ve added some features in so that it’s more flexible to fit a more complex curve.

You should have been able to decrease your error squares, so SSres should have decreased.

This results in multiple issues:

  • R2 is a positively biased estimator (always overshoots)
  • Bias towards complex models
  • Overfitting

Adjustment

To account for the bias, we penalize the R2 by the number of predictors k used in the model.

The penalty is defined as:

n1nk1

where n is the sample size.

Notice that the penalty is 1 when k=0, and it increases as k increases.

Remembering that R2 is defined as:

R2=1SSresSStot

We can penalize R2 by bumping up the subtracted term with the penalty:

SSresSStot×n1nk1

Through substitution, we get the definition for adjusted R2:

Radj2=1(1R2)n1nk1