Is the future of foreign aid in Europe the broken U.S. system?

Aid advocates should be careful what they wish for. If you advocate for an input target like 0.7%, you don’t have have a leg to stand on when the government hits the target but uses the money for whatever it can get away with within the rules.

That is Owen Barder analyzing the U.K.’s new development and aid strategy. It is a very good and very important piece, largely about how the U.K. is moving in a U.S.-like direction, with more focus on national self interest, international security, and outsourcing.

What worries me about this direction:

  • “This looks really effective at helping people. Let’s copy this,” said no one ever after looking at the U.S. system of outsourcing development project to private contractors.
  • There are good reasons why right-leaning governments want to use their aid policy to serve the national interest. But I think they ignore an important cost: people don’t like you if you give them presents that are really for you. It is so transparent. You become known as a jerk. You lose respect and standing in the long run. That standing, as it turns out, is actually pretty useful when you want to police the world or play the liberator. In my mind, the pursuit of national interest in the short term undermines national interest in the long term.
  • Every time I hear a very sensible-sounding goal to reduce poverty or violence from a European officlal, I can’t help thinking that it’s cover for: “how do we keep more poor brown people form arriving on our borders?” It is AMAZING how much this refugee issue colors every aid and research project associated with a UK or European agency today.
  • The UK is one of the most important and influential donors and, in my experience, their development agency DFID has been the most sensible and effective. While the political winds may blow DFID in a bad direction, I am optimistic the staff will steer it back over time.

Anyways, Barder is one of the best people to tell you about the UK system, and I am not, so do read his post.

Are social science RCTs headed in the wrong direction? A roundup of the discussion

Last week I posted my worries about the direction of social science experiments, how raising the quality bar was going to drop the quantity of studies (and what that will sacrifice). I did not expect tens of thousands of visits, tweets, replies, and suggested readings. Even Vox picked it up, causing me to immediately think: “Oh man, why do I say these things out loud?”

Clearly there is an untapped demand for mournful posts about feeling overworked, masquerading as critiques of science. And here all I thought you cared about was Star Wars and Trump.

My basic point was this: each study is like a lamp post. We might want to have a few smaller lamp posts illuminating our path, rather than the world’s largest and most awesome lamp post illuminating just one spot. I worried that our striving for perfect, overachieving studies could make our world darker on average.

There were lots of good comments, and I thought I’d summarize some of them here.

David McKenzie blogged that he thinks my concerns apply more to the big government evaluations than the new, smaller, flexible, lab-like experiments that more and more social scientists are running in the field.

I suppose that’s possible, but I found that the more control over experiments I’ve had, the more ability I have to add bells and whistles. The expectations of what is possible are higher. And I push myself harder, since I’m my own worst enemy. So, in my view, the incentives to over-invest are greatest with the experiments we control the most.

Meanwhile, Rachel Glennerster countered David with tips on how to work with government partners on randomized evaluations.

But most surprisingly to me, a huge number of commenters said something along the lines of: “I’d much rather have a couple of really good studies than a whole bunch of small and underpowered ones.” On some level: of course. But on another level, this kind of statement is exactly the problem I’m describing.

Think in extremes for a moment. On one extreme, we could have just one experiment in one place to answer an important question, and it would be big and amazing. On the other extreme, we could have thousands of tiny trials in as many different places, each with only a trivial sample.

Obviously neither of these extremes makes any sense. There’s a sweet spot in the middle. I read those commenters as saying “we know where we want to be, and we’re already there.” I’m not so confident in the status quo, and we should always be suspicious when “right here” feels like the best place in the world.

I think economics might be just a little too close to the first extreme, with just one big awesome study. Why do I say this? Because we obsess about internal validity all the time and almost ignore external validity. The profession is not optimizing. Meanwhile, my sense is that political science paper-writers are a little too far towards the other extreme, with more emphasis on quantity over quality (this does not apply to the books so much).

Scientific and statistical analysis should be able to guide us to the optimal point. For this we probably need a better science of extrapolation. The fact that I am not aware of any such science doesn’t mean it doesn’t exist. But it’s not something in the mainstream that we discuss. This should worry you.

Actually there are a few things out there.

  • Jorg Peters sent me this systematic review of all experimental papers published in top economics journals, with a good discussion of the many external validity problems.
  • My colleague Kiki Pop-Eleches and coauthors have a draft paper that looks at a natural experiment that’s happened in every country—the gender combination of your children—and use it to build models for understanding when we can and cannot extrapolate well, and how to build experiments to maximize generalizability.
  • Lant Pritchett send me a new paper arguing that, when social programs are complex, with lots of dimensions, we benefi t from testing more elements of its design, even if it leads to small sample sizes and low statistical power.

I can’t say whether these papers are correct, or if the list is complete, but I enjoyed skimming through them, and plan to read them carefully sometime soon. All this strikes me as a pretty important area for more research and reflection.

I would love to get pointers to other work in the comments.

Links I liked

  1. The best and the worst of America in one sentence: “Pro-gun demonstrators staging a mock mass shooting near the University of Texas at Austin on Saturday were overwhelmed in number and ferocity by a large group of counter-protesters wielding dildos and machines that generated fart sounds.
  2. A Booker prize winner plans to write the “African Game of Thrones”
  3. Darth Trump (You know this video is genius because it is still funny after seven minutes)
  4. 52 interesting links
  5. And last but not least, I give you the Christmas sweater that kills all others (literally?): Cane of Thrones


Clusterjerk, the much anticipated sequel

This was a week of nerdily viral statistics posts on my blog. A few days ago I talked about the knee-jerk clustering of standard errors in so many papers, and whether we should ever do this in individually-randomized experiments.

Turns out a lot of you have opinions and answers. Thanks for that.

For me the best news was an email from Stanford econometrician Guido Imbens:

The bottom line is that you are right. You have a sample, collected in whatever way. You do a randomized experiment with randomization at the unit level. Then you do not need to cluster. Without doing any cluster adjustment your standard robust variance estimator will lead to the correct inferences for the average effect for the (convenience) sample you have.

He backed this up with a forthcoming working paper of his with Alberto Abadie, Susan Athey, and Jeffrey Wooldridge. This is basically the Mount Rushmore of applied econometrics, so: game, set, and match?

Not entirely. The paper is not yet ready for circulation (wait for early 2016). So let me outline some of the other intuitions and proofs people argued.

Continue reading

IPA’s Weekly Links

Guest Post by Jeff Mosenkis of Innovations for Poverty Action.

  • CondomsFiveThirtyEight has the story about the accumulation of evidence that microloans aren’t effective poverty solutions. Rupert Scofield of microlender FINCA says:

FINCA has now developed its own measure of success, a step Scofield now says he wishes he had taken earlier. But he said he already knows that the outside researchers’ findings are flawed. “The fact that they would reach these conclusions that I personally know to be false really discredits them in my eyes,” Scofield said.

  • When Mark Zuckerberg and Priscilla Chan announced they’d be donating most of their Facebook stock to helping others there was an inexplicable pile-on suggesting this was somehow selfish or a way to get around taxes (examples here and *ahem*). There’s a nice discussion of that and many other things on Slate’s current Moneybox podcast.
  • Whatever happened with the the Gates million dollar “Condom of the Future” contest from a couple years ago? reports that several ideas were awarded initial funding, but it takes more than a million dollars and a decade to get FDA approval, so none will likely ever be commercialized.
  • The recent Burkina Faso election, in addition to giving the country a new president for the first time in nearly thirty years, was also the first election (as I understand it) to have continuous live data posted thanks to the Burkina Open Data Initiative. (via Harrison)
  • The Leamer-Rosenthal Prizes for Open Social Sciences were awarded today including to Eva Vivalt, the economist responsible for AidGrade which lets anybody synthesize multiple studies in an area of development.
  • The success of cash transfers has led to reconsideration of the “basic income” guarantee as a social safety net, and Finland is conducting an experiment with it, which hopefully will be randomized to test different versions.

It’s been a rough couple of weeks in the news. Let’s not forget to believe in magic.

photo credit: Paul Keller.

The answer to life, the universe and everything, including gun violence and world peace, is (once again) 42

Tyler Cowen notes that roughly 42% of all the guns in the world are owned by Americans. And America accounts for about 42% of global military spending. He then says something extremely interesting and provocative:

I see those two numbers, and their rough similarity, as the most neglected fact in current debates about gun control.

I see many people who want to lower or perhaps raise those numbers, but I don’t see enough people analyzing the two as an integrated whole.

I don’t myself so often ask “should Americans have fewer guns?”, as that begs the question of how one might ever get there, which indeed has proven daunting by all accounts.  But I do often ask myself “should America be a less martial country in in its ideological orientation?”

That is, owning guns and policing the world (and occasionally invading it) are symptoms of something deeper. His diagnosis is a militaristic culture. I’m not sure that’s true, but I would be interested to read more.

If true, it begs the question of how martial cultures develop or change. Germany and Japan changed immensely, having followed the classic path of “decisive military defeat of your genocidal and megalomaniacal leaders.” Perhaps there is another way for the rest of us.

Update: Dan Drezner has a great reply. Read it.

How to save referees from awful papers and save authors from awful referees

Brendan Nyhan has an idea: on how to improve the review process where causal inference is involved:

Why not try to shift the focus of reviews in a more valuable direction? I propose that journals try to nudge reviewers to focus on areas where they can most effectively improve the scientific quality of the manuscript under consideration using checklists, which are being adopted in medicine after widespread use in aviation and other fields

Let’s see if you can guess my favorite checklist.

Here are some items from one he suggests:

  • Does the author provide their questionnaire and any other materials necessary to replicate the study in an appendix?
  • Does the author use causal language to describe a correlational finding?
  • Does the author specify the assumptions necessary to interpret their findings as causal?

And here are some items from a second:

  • Did you request that a control variable be included in a statistical model without specifying how it would confound the author’s proposed causal inference?
  • Did you request any sample restrictions or control variables that would induce post-treatment bias?
  • Did you request a citation to or discussion of an article without explaining why it is essential to the author’s argument?

It seems to me that articles are so heterogeneous it would be hard to come up with a checklist that works for most papers that is not cumbersome to the referee. But it could be worth a try. If limited to quantitative causal inference papers it would be a step forward.

The first checklist could simply be a manuscript submission guide or checklist for authors before they submit. They have stronger incentives to answer. Anyways, I applaud experimentation along these lines.

For more, here are a few older inks:


Since yesterday’s pointy-headed statistics post proved unexpectedly viral, I assume you want more econometric rants. So here’s something that has been bothering me all week.

When I was in graduate school, economists discovered clustered standard errors. Or so I assume because it almost became a joke that the first question in any seminar was “did you cluster your standard errors?”

Lately I’ve been getting the same question from referees on my field experiments, and to the best of my knowledge, this is wrong, wrong, wrong.

So, someone please tell me if I’m mistaken. And if I’m not, a plea to my colleagues: this is not something to write in your referee reports. Please stop.

[Read the follow up post here]

I guess I should explain what clustering means (though if you don’t know already there’s a good chance you don’t care and it’s not relevant to your life). Imagine people in a village who experience a change in rainfall or the national price of the crop they grow. If you want to know how employment or violence or something responds to that shock, you have to account for the fact that people in the same village are subject to the same unobserved forces of all varieties. If you don’t, your regression will tend to overstate the precision of any link between the rainfall change and employment. In Stata, this is mindlessly and easily accomplished by putting “, cluster” at the end of your regression, and we all do it.

This makes sense if you have observational data (at least sometimes). But if you have randomized a program at the individual level, you do not need to cluster at the village level, or some other higher unit of analysis. Because you randomized.

Or so I believe. But I don’t have a proof or citation to one. I have asked some of the very best experimentalists in the land this week, and all agree with me, but none have a citation or a proof. I could run a simulation to prove it, but surely someone smarter and less lazy than me has attacked this problem?

While I’m on the subject, my related but nearly opposite pet peeves:

  • Reviewing papers that randomize at the village or higher level and do not account for this through clustering or some other method. This too is wrong, wrong, wrong, and I see it happen all the time, especially political science and public health.
  • Maybe worse are the political scientists who persist in interviewing people on either side of a border and treating some historical change as a treatment, ignoring that they basically have a sample of size of two. This is not a valid method of causal inference.

Update: The follow up post is here.

Links I liked

  1. If you were following the “does unemployment and exclusion drive Belgian extremists” debate, see the updates at the end of last week’s post
  2. Justice Stevens has six simple, important, but probably hopeless amendments to improve the U.S. Constitution. Important to read.
  3. What refugees ask when they reach Europe
  4. Traveling with 3 and 4y.o. boys is like transferring serial killers from a prison. You have to be constantly aware
  5. Last week I posted some of the best Adele covers. Here is a competitor for the worst.
  6. Quote of the week from @MichaelKleinman: “Do No Harm – a phrase used exclusively by those who assume they are not doing harm, but are most likely to cause it”
  7. And ad of the week:

Trump: America’s first African President?

This is two months old, but grows more and more appropriate every day.

On the subject of Trump, I thought this speech to the Republican Jewish Coalition was insightful. Start watching around minute 16 or 17, when he talks about not wanting their money. This is mostly showmanship but it’s a reminder why he’s a powerful and popular candidate among a bunch of cookie-cutter panderers.

Why I worry experimental social science is headed in the wrong direction

I joke with my graduate students they need to get as many technical skills as possible as PhD students because the moment they graduate it’s a slow decline into obsolescence. And of course by “joke” I mean “cry on the inside because it’s true”.

Take experiments. Every year the technical bar gets raised. Some days my field feels like an arms race to make each experiment more thorough and technically impressive, with more and more attention to formal theories, structural models, pre-analysis plans, and (most recently) multiple hypothesis testing. The list goes on. In part we push because want to do better work. Plus, how else to get published in the best places and earn the respect of your peers?

It seems to me that all of this is pushing social scientists to produce better quality experiments and more accurate answers. But it’s also raising the size and cost and time of any one experiment.

This should lead to fewer, better experiments. Good, right? I’m not sure. Fewer studies is a problem if you think that the generalizabilty of any one experiment is very small. What you want is many experiments in many places and people, which help triangulate an answer.

The funny thing is, after all that pickiness about getting the perfect causal result, we then apply it in the most unscientific way possible. One example is deworming. It’s only a slight exaggeration to say that one randomized trial on the shores of Lake Victoria in Kenya led some of the best development economists to argue we need to deworm the world. I make the same mistake all the time.

We are not exceptional. All of us—all humans—generalize from small samples of salient personal experiences. Social scientists do it with one or two papers. Usually ones they wrote themselves.

[Read the follow-up post here]

The latest thing that got me thinking in this vein is an amazing new paper by Alwyn Young. The brave masochist spent three years re-analyzing more than 50 experiments published in several major economics journals, and argues that more than half the regressions that claim statistically significant results don’t actually have them.

My first reaction was “This is amazingly cool and important.” My second reaction was “We are doomed.”

Continue reading

Here’s Trump laying out his entire electoral strategy in 1987, and it’s all about balls and hyperbole

516W6PY-hvL._SX300_BO1,204,203,200_From his book, Art of the Deal.

After he lost the election to Ronald Reagan, Carter came to see me in my office. He told me he was seeking contributions to the Jimmy Carter Library. I asked how much he had in mind. And he said, “Donald, I would be very appreciative if you contributed five million dollars.”

I was dumbfounded. I didn’t even answer him.

But that experience also taught me something. Until then, I’d never understood how Jimmy Carter became president. The answer is that as poorly qualified as he was for the job, Jimmy Carter had the nerve, the guts, the balls, to ask for something extraordinary. That ability above all helped him get elected president.

Later, there is this:

The final key to the way I promote is bravado. I play to people’s fantasies. People may not always think big themselves, but they can still get very excited by those who do. That’s why a little hyperbole never hurts.

I call it truthful hyperbole. It’s an innocent form of exaggeration—and a very effective form of promotion.

But irony, oh blessed irony:

But then, of course, the American people caught on pretty quickly that Carter couldn’t do the job, and he lost in a landslide when he ran for reelection.

Ronald Reagan is another example. He is so smooth and so effective a performer that he completely won over the American people. Only now, nearly seven years later, are people beginning to question whether there is anything beneath that smile.

History repeats itself, the first time as tragedy and the second time as farce.

Hat tip.

Interesting sentences, Zuckerberg giveaway edition

Mr. Zuckerberg didn’t create these tax laws and cannot be criticized for minimizing his tax bills. If he had created a foundation, he would have accrued similar tax benefits. But what this means is that he amassed one of the greatest fortunes in the world — and is likely never to pay any taxes on it.

Anytime a superwealthy plutocrat makes a charitable donation, the public ought to be reminded that this is how our tax system works. The superwealthy buy great public relations and adulation for donations that minimize their taxes.


Some facts on gun violence

Hat tip.

Of course, sometimes the fact-based news is not the most poignant:

IPA’s weekly links

Guest post by Jeff Mosenkis of Innovations for Poverty Action.


And, if you’ve ever had someone find a mistake in one of your papers, don’t feel bad, it even happens to the New York Times. (via Sarah Boxer)



Exclusion, not unemployment, explains ISIS recruitment?

Screenshot 2015-11-30 22.01.13Yes and no, says Philip Verwimp, a Belgian economist who studies violence. The graph above plots non-European-born employment levels in Europe against estimated ISIS recruitment. Way over on the right, B is Belgium.

[Note: See updates from Philip Verwimp at the end]

While the foreign-born cannot get jobs, Belgian nationals can. This graph shows that Belgium has the biggest employment gap between nationals and non-European-born in all of Europe:

Screenshot 2015-11-30 22.01.24

You might think this is an unemployment-causes-violence kind of argument, but it is not. It’s possible, but Verwimp knows that this runs against most of the evidence, which finds little association between poverty and violence.

Here is Verwimp:

Belgium has a very elaborate welfare state. All citizens have health coverage, schools and universities charge no or few fees, child benefits, unemployment benefits, pensions, are all in place. But this comes at a cost of a closed labour market, meaning a labour system that heavily protects those who are in, but makes entry for newcomers very difficult.

It does not seem to be poverty, but exclusion. Philip wrote to me:

One of my students from African origin, graduating from our MA program, told me (before the Paris attacks) « it is easier to get unemployment benefits in Belgium than to get a job ». He decided to move to Canada. That summarises it. Migrants and their families have full access to the allocations of the welfare state, but face daunting challenges when they want to get ahead in life.

…I am not looking at individual factors to join IS, as young adults across European cities many share similar reasons, but for ‘structural’ factors that make the situation different in some countries compared to others.

Why is there exclusion from labor markets? Verwimp argues it’s not all education or other selection. In their work on France, Adida, Laitin, and Valfort have made a persuasive case that it’s discrimination, though it might be discrimination on where you’re from rather than anti-Muslim feelings. [Actually, this may have been an error of mine—see below]

Another possibility I find quite plausible: the shame and injustice of exclusion, not poverty, is what leads so many to rebel. This is possibly one of the hardest propositions in social science to test.

If you want to read all the evidence I know, here is my review of the employment and violence literature. It is a work in progress, so comments welcome.


  • At Vox, Jennifer Williams took Philip and me to task for oversimplifying the debate.
    • She has a point, especially in that I was a bit glib. But clickbait headlines aside (which, ironically, Ezra Klein and Vox convinced me was actually a principled way to get people to read research) I don’t think we oversimplified.
    • Maybe the difference is this: some people believe armed mobilization is different for everyone, while others think that there are some systematic forces at work alongside the idiosyncrasies. WIlliams sounds like she leans to the first approach, Philip and I the second.
    • This is what quantitative social science does: it finds the signal and systematic patterns in the noise. Philip’s simple cross country comparison was just a thought piece, and isn’t conclusive. But if you ask me, social science does (and will continue) to vindicate the idea that social exclusion and discrimination is a powerful driver of rebellion and extremism. Probably we all agree on this and so we’re arguing at nothing.
  • In response to a huge amount of discussion, Philip revised his analysis of the correlation between socioeconomic outcomes and the number of Syria Fighters by looking at the second generation of migrants apart from the first generation.
  • Claire Adida writes me about an error I made: “our study actually indicates that the discrimination we have identified in France is Islamophobia, not xenophobia. Our book, recently published by Harvard University Press, goes into greater detail on this.
  • Tyler Cowen at first panned the book, but then posted the authors’ thoughtful response. Worth reading, if only as an example of how to respond to critics in a dignified way.