Spurious Correlation

If two variables are correlated, but there is no causal relationship between them, then the correlation is said to be spurious.

Spurious correlation is common in time series data because chances of finding a false causal relationship increases as intermediate variables increase. In addition, temporal variables are common confounders that can cause spurious correlation.

It is very tricky to avoid spurious correlation.

Table of contents
  1. Things that can cause spurious correlation
    1. Trend
    2. Confounder
    3. Dependency in variables
    4. Pure luck

Things that can cause spurious correlation

Trend

Data with a trend is more likely to produce a spurious correlation.

Compared to stationary data where nothing new is really happening, it makes sense that moving data is more likely to be falsely connected with other moving data.

Confounder

A confounder is a variable that is correlated with both the independent and dependent variables.

  • Increase in ice cream sales leads to increase in drowning deaths?
    • Summer is the confounder
  • Increase in temperature leads to increase in divorce rates?
    • Time is the confounder. Things typically increase over time.

Dependency in variables

It is easy to mistake two variables that are dependent on each other to be independent.

If one is a calculated value of the other, for example, you would expect them to be highly correlated.

Pure luck

Sometimes, things may seem like that just because of pure luck.