Correlation vs. Causation

May 4, 2022
From Tyler Vigen's site, Spurious Correlations

Correlation does not imply causation. A phrase that seemingly refutes most casual-non-causal statistical observations. The divorce rate in Maine had a 99.26% correlation with the per capita consumption of margarine from 2000-2009. Surely eating margarine doesn't cause divorces.

But is there a more specific reasoning other than correlation does not imply causation? Here are a few reasons why we might observe two correlated data that are not causal.

  1. There's actually reverse causation. We observe worse weather when Uber prices increase. Yet, Uber prices do not cause bad weather.
  2. There's a third, confounding variable. Sunburns are correlated with ice cream eating.
  3. Selection bias. We sample data in a way that over represents a particular trait or group.
  4. The relationship is purely coincidental.

How do you observe causality then? There's no hard and fast rule. Causal inference is hard. Hill's criteria for causation provides a decent starting point. Here are some excerpts from his criteria.

  1. Strength – how large is the effect? Small effects aren't necessarily not causal, but the larger the effect, the more likely it is causal.
  2. Temporality – The effect should occur after the cause.
  3. Biological gradient – Often times, higher exposure leads to more of an observed effect. The obvious analogy here is medicine.