We interrupt our regular programming for a brief statistics rant.
I keep seeing papers that make the following argument: “We want to test whether more T leads to more Y. Unfortunately there are lots of unobserved variables that could be driving both T and Y, so a least squares regression of Y on T (plus some correlates X that we do observe) is probably going to give a biased estimate. Therefore I am going to use a matching estimator to reduce the bias.”
This mistake is made by papers in some of the best journals, especially in political science. It has even been made by a couple of causal identification methodologists who shall remain nameless. I call it the cardinal sin of matching.
Here is the short story: matching is not
an identification strategy a solution to your endogeneity problem; it is a weighting scheme. Saying matching will reduce endogeneity bias is like saying that the best way to get thin is to weigh yourself in kilos. The statement makes no sense. It confuses technique with substance.
Your causal inference problem is pretty simple: there are things you can’t measure that could lead to more of both T and Y. Let’s call these pesky variables Z. Unless you can find a way to observe Z (or find that holy grail: an instrument for T), you could run the fanciest estimator in the world and it would make little difference.
When you run a regression, you control for the X you can observe. When you match, you are simply matching based on those same X. If X are a pretty good proxy for Z, then you’ve probably reduced your endogeneity bias. But whether you proceed via matching or regression is of little consequence.
For causal inference, the most important difference between regression and matching is what observations count the most. A regression tries to minimize the squared errors, so observations on the margins get a lot of weight. Matching puts the emphasis on observations that have similar X’s, and so those observations on the margin might get no weight at all. For the math see this excellent little book.
Matching might make sense if there are observations in your data that have no business being compared to one another, and in that way produce a better estimate, but the 800lb identification problem is still staring at you from the corner.
Even if you say he only weighs 363 kilos.