I’ve always felt the proponents of randomized trials judged observational studies too harshly. Too many quote the famous Lalonde paper and write off matching approaches. Yet this is one example, with rather rotten matching data. Ironically, such randomistas don’t apply their own standards of evidence to their judgment.
There probably need to be more studies like this one:
The ability of propensity score analysis (PSA) to match impact estimates derived from random assignment (RA) is examined using data from the evaluation of two interdistrict magnet schools. As in previous within study comparisons, the estimates provided by PSA and RA differ substantially when PSA is implemented using comparison groups that are not similar to the treatment group and without pretreatment measures of academic performance. Adding pretreatment measures of the performance to the PSA, however, substantially improves the match between PSA and RA estimates. Although the results should not be generalized too readily, they suggest that non-experimental estimators can, in some circumstances, provide valid estimates of the causal impact of school choice programs.
Yes, there are other examples to the contrary, but I suspect better baseline data could produce good results.
I would welcome pointers to work that supports this, or tells me I’m wrong.
6 Responses
Chris, this is more conversational than academic, but offers one or two helpful pointers on what good observation can bring: http://www.slate.com/id/2290579/pagenum/all/#p2.
Chris:
The famous Dehejia and Wahba paper showed that you could get the right answer in the famous Lalonde problem by appropriately adjusting the observational data. See here:
http://www.nber.org/%7Erdehejia/papers/dehejia_wahba_jasa.pdf
and here:
http://www.nber.org/%7Erdehejia/papers/matching.pdf
for the original papers, and here
http://www.nber.org/%7Erdehejia/papers/practical_pscore.pdf
and here:
http://www.nber.org/%7Erdehejia/papers/postscript.pdf
for Dehejia’s final thoughts on the matter
We also look at this issue in the context of migration effects in a recent paper in the JEEA (http://onlinelibrary.wiley.com/doi/10.1111/j.1542-4774.2010.tb00544.x/abstract). As Jeff has noted, it matters what you condition on.
That said, my feeling is that matching seems easier to make convincing and easier to control for enough stuff when selection is done by Governments or policymakers as to which villages to serve (where there are often lots of political and logistical issues involved that perhaps don’t matter so much for outcomes of interest) than when individuals or firms are self-selecting into programs (where often it does seem like unobserved individual characteristics like entrepreneurial spirit, drive, ambition, social connections, etc. are going to drive both program participation and outcomes of interest). We can of course do a lot better job trying to measure these types of things, and in collecting multiple periods of pre-treatment data. The more we know about the selection process the better of course as well – but this is where I think the challenge is harder.
“In a study published in the New England Journal of
Medicine (Concato et al. 2000), researchers compared findings about the
effectiveness of five different clinical interventions produced from RCTs as
compared to observational studies (using cohort or case-control designs). They
found that, while the summary results from RCTs and observational studies were
‘remarkably similar’, findings from RCTs showed more variation between studies
– to the extent that some of them produced findings at odds with results from the other studies.” Learning from the evidence about evidence-based policy. http://bit.ly/mTt8aq .
Concato J., Shah M.P.H. and Horwitz R.I. 2000, ‘Randomized, controlled trials,
observational studies, and the hierarchy of research designs’, New England
Journal of Medicine, vol. 342, no. 25, pp. 188792.
The Diaz and Handa (2006) JHR paper is nice, and I would say that even if Diaz weren’t my student … more broadly, remember that the question is not “does PSM work”. The question is, “what variables do you need to condition on in particular substantive contexts”.
See section 2.4 comparing experimental and non-experimental estimates in this handbook http://faculty.arts.ubc.ca/kmilligan/teaching/duflo-glennerster-kremer.handbook08.pdf (I am sure you have it.) Sorry, I dont have a non-randomista source. I am thinking your best bet is statisticians. Don Rubin? Or maybe Andrew Gelman will respond.