Quantitative versus qualitative measurement, the contest

If you run a survey of drug use, prostitution, domestic violence, rioting, or crime, who would believe this self-reported data? No one.

If you work in one of the handful countries with reliable, available data, then you might be able to use police or hospital records. I’m looking at you, American scholars, you lucky bastards.

If you’re working in the largely evidence-free zone that is poor countries–especially fragile states–then you have to get creative. Some of my favorites: Alex Scacco randomized whether she interviewed potential Nigerian rioters behind a screen or not. After running an ethnic reconciliation program, Betsy Paluck gave out community gifts in Rwanda and looked at how they were shared out.

In one of my more self-flagellating moments, some colleagues and I decided to start a study of crime and violence-reduction among street youth in Liberia–mostly men who make their living from petty crime and drug dealing (among other things) and lead very risky lives.

The effects of the programs–behavior change therapy and cash transfers–I’ll discuss another time. In brief, a cheap and short program of cognitive therapy seems to have dropped crime, violence, and drug use by huge amounts. And the effects persisted at least a year.

There were two possibilities. One, we’d stumbled upon a miracle cure. Two, they were telling us what we wanted to hear.

What to do? Well, we said, let’s try to measure the measurement error.

I prefer to mix my quant work with serious qual work. Basically, we had two or three truly gifted Liberian research assistants who collected qualitative data full time. Transcripts of conversations in the thousands of pages that I sometimes wonder how we will ever read. We’d visit dozens of the men in the study over and over again across a year to see how their lives changed over time.

These qualitative researchers were already embedded in the communities. So, we thought, why not have them hang out with the survey respondents for four days around the time of their survey? We’d get a general sense of their lives (with full disclosure and consent), but through conversation and observation focus on figuring out the same six things about all of them–some seemingly sensitive behaviors (drug use, petty theft, gambling, and homelessness) plus for balance the use of a few common luxuries, video clubs and paid phone charging services.

The qualitative researchers had no idea what the guys had said on the survey, but just coded their own assessment, usually based on a frank admission of yes or no. Across 4000 surveys, we tried this a random 300 times.

So what happens when you compare a survey question, “did you smoke marijuana in the last two weeks?” to four days of “deep hanging out”? In our case, you basically get the same answer.

The survey and qualitative measures were the same about 75% of the time. When they differed, that difference wasn’t systematic at all. We’d get the same levels of average drug use of stealing from either method. At least for the so-called sensitive behaviors. As it turns out, these particular guys had no reservations at all about talking about crime or drugs. It was such an everyday part of their lives. So, as best we can tell, the survey reports were reasonably right, and the falls in “bad” behaviors real.

What’s interesting is that this didn’t apply to the so-called non-sensitive luxury goods. These the men underreported a little, and almost entirely in the control group. There are a few explanations, but one is that the control group wanted to appear poorer and more deserving of a future program.

The wonkier among you might be wondering: why not use list experiments? The short answer: I don’t 100% believe them, and if you tried testing them on illiterate street youth with short attention spans you’d give up too.

Actually, our deep hanging out wasn’t as hard or as expensive to do as you’d think. Tracking and surveying each person each time cost about $75 on the margin (they were hard to keep track of). The qualitative validation had roughly the same per person variable cost.

We hope more people will try it out. That’s why we not only wrote up the results as a paper, but also have an appendix explaining how we did it in detail. Plus algebra!.

The whole exercise gave us a lot of confidence in our results. Frankly I don’t think any respectable journal would publish the main experiment without this confidence. If you think you’re going to try it, email and we’ll fill you in on more lessons learned and experiences.