Causal inference for the home gardener

Published on November 27, 2024 5:55 PM GMT

Note: This is meant to be an accessible introduction to causal inference. Comments appreciated.

Let’s say you buy a basil plant and put it on the counter in your kitchen. Unfortunately, it dies in a week.

So the next week you buy another basil plant and feed it a special powder, Vitality Plus. This second plant lives. Does that mean Vitality Plus worked?

Not necessarily! Maybe the second week was a lot sunnier, you were better about watering, or you didn’t grab a few leaves for a pasta. In other words, it wasn’t a controlled experiment. If some other variable like sun, water, or pasta is driving the results you’re seeing, your study is confounded, and you’ve fallen prey to a core issue in science.

When someone says “correlation is not causation,” they’re usually talking about confounding. Here are some examples:

student test scores correlate with the number of books

piano lessons

convicted offenders sent to prison live longer

more reliable trial

increases

So now you know that you shouldn’t compare plants that you bought at different times, because this risks confounding. One way to address confounding is to try to hold all the important variables constant—a controlled experiment. You buy two plants at the same time from the same store. You put them in the same spot and water them equally, and always pluck the same number of leaves from each. The treated plant survives, and the control plant withers.

Does the powder work? A remaining problem is that even holding constant many of the variables (store, date bought, and so on), there’s still some inherent randomness in the life of a basil plant.

This randomness could be due to genetics or the soil conditions when it was a wee sprout. With enough plants, it would wash out, with either group as likely to be lucky as unlucky on average. With just two plants, however, it’s likely that random factors would cloud or even exceed the benefit from the powder. When the measured benefit in your study is plausibly just random noise, your study is underpowered. In engineering, this could be seen as a signal-to-noise problem. With only two plants, the noise (random variation) might overwhelm the signal (the effect of Vitality Plus).

A/B test calculators

Now you know that you shouldn’t compare plants raised in different conditions (because there could be confounding) and you can’t just compare two plants, even with lots of control over their conditions (because of random variation—one plant could get lucky, independent of Vitality Plus).

We need a large sample of plants with random variation in which one gets treated. What are some of the techniques?

Experiment or Randomized trial:

stratifying

other

intent-to-treat analysis

intended

not what actually happened

encourage the treatment group to get screened

within-subjects design

Quasi-experiment:

instrumental variable

random assignment to judges

just

maternal leave policy

ER physician who is more loose

Whether you're testing plant powder, educational methods, or medical treatments, the principles remain the same: Watch out for confounding variables. Use large enough samples to overcome random noise. And create or find random variation in treatment take-up for a reliable estimate. These provide some of the best defense against bad ideas that invariably sprout up.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签