Do editors and referees at the best journals actually pick the best papers?

Vera te Velde discusses a new paper by David Card and Stefano Dellavigna on what gets published in top journals:

The conclusions David drew are that 1) referees are indeed good at assessing quality, 2) the process contains affirmative action for junior/less prolific authors, and 3) editors are not overconfident. Thus, the myth of unfairness is dispelled.

The assumption this story rests on is glaring and glaringly fragile: ex post citations is the relevant measure of paper quality when people assess whether papers are fairly treated.

From the perspective of editors, I completely understand why you would focus on citations. That’s how your journal gains prominence. But as a scientist, what I want and what I believe is the gold standard for fairness is that papers are published and cited in proportion to their quality. Treating citation rates as quality assumes away half of the problem.

Are citation numbers just the best measure of quality that we’re stuck with? Well I’m sure that was the reason for using it, and I’m sure citations are correlated with quality, but as they show, referee ratings are also correlated with citation numbers. Since the citation process is self-evidently biased in favor of prolific authors** (I’m sure you can prove this to yourself through introspection just as easily as I did), and since referees are several of a very small number of people who thoroughly study any given paper, it seems utterly bizarre that the former, and not the latter, would be treated as the primary proxy measure of quality (if the goal of the paper is in fact to assess fairness rather than to assess journal performance.)

If we consider referee ratings the better measure of quality, the conclusions exactly reverse and exactly confirm some of the common suspicions of the editorial process: 1) Citations are a good measure of quality but substantially biased in favor of prolific authors and multi-author papers, 2) editors are biased in favor of prolific authors, but not as much as citations are, and they are not biased in favor of multi-author papers, and 3) editors could reduce their bias by putting less weight on their personal priors.

The full piece and comments are fascinating.

Ruth Bader Ginsburg does not hold back on her critiques of the last Supreme Court term

I did not expect a Supreme Court justices would be so frank or critical of the court. From a New York Times interview with Ruth Bader Ginsburg:

Asked if there were cases she would like to see the court overturn before she leaves it, she named one.

“It won’t happen,” she said. “It would be an impossible dream. But I’d love to see Citizens United overruled.”

She mulled whether the court could revisit its 2013 decision in Shelby County v. Holder, which effectively struck down a key part of the Voting Rights Act. She said she did not see how that could be done.

The court’s 2008 decision in District of Columbia v. Heller, establishing an individual right to own guns, may be another matter, she said.

“I thought Heller was “a very bad decision,” she said, adding that a chance to reconsider it could arise whenever the court considers a challenge to a gun control law.

Should Judge Garland or another Democratic appointee join the court, Justice Ginsburg will find herself in a new position, and the thought seemed to please her.

“It means that I’ll be among five more often than among four,” she said.

The one website that satisfies my election prediction obsession

It’s election season, which means I obsessively and pointlessly check PredictWise more than once a day. It aggregates all the election betting markets (sort of like the Kayak of prediction markets). I actually have a shortcut on my iPhone home screen for the web page. Seriously. It’s a disease.

I also enjoy the periodic blog posts by the creator, David Rothschild. For example:

The state-by-state breakdown of the presidential election in 2016 varies a lot from the 2012 election; in 2012 Republican Mitt Romney needed to sweep Florida (FL), Ohio (OH), and Virginia (VA) to win the election, in 2016 Republican Donald Trump needs to win 3 of 4 from FL, OH, VA, and Pennsylvania (PA). The chart tracks the probability of victory on PredictWise since we started state-by-state predictions on February 27, 2016.

PredictWise_July10Other states matter to the election, just not in the quick view of who is going to win.

1) The assumption here is that if Trump wins FL, OH, and PA he will win enough other states to win or if Clinton wins PA and VA she will win enough other states to win. The candidates still need to win those states, but they are very, very likely to be swept along on the wave should the candidate win the bigger states.

2) Putting North Carolina (50%), Arizona (27%), Missouri (21%), and Georgia (12%) in-play is really costly to the Trump campaign. It needs to play defensive in states it could traditionally spend less time and money. But, it does not alter the reality that Trump still needs to win 3 of the 4 big states.

Up through today the winner of the general election and the winner of each state has used some of the same data, but they are not formally tied together; this has worked fine, but it is starting to get a little concerning. The probability that Clinton wins the general election should not really fall below the probability that she wins PA, the current “swing” state. Clinton is up 43.5% to 39.1% in the Pollster average (solid lead for the candidate in a state that has gone to her party every year since 1992), she is trading at $0.69 on PredictIt, fundamental prediction is down just above 50%. All of this combines in my model to 78%. So, what should you make of Clinton 75% to win the election and 78% to win PA?

1) Not a huge discrepancy, so not a big deal.

2) The national markets may be undervalued for Clinton and the state-by-state data may indicate a slightly higher valuation. I do not suspect 2012 style manipulation. I suspect that the $850 cap per person/contract on PredictIt is giving its very anti-establishment trader-base too much say versus the “good money”. It is something I will be following very closely in the next few weeks.

Miguel-Kremer versus Cochrane: The battle of the meta-analyses

The latest salvo in the worm wars brings out big guns:

There is consensus that the relevant deworming drugs are safe and effective, so the key question facing policymakers is whether the expected benefits of MDA exceed the roughly $0.30 per treatment cost. The literature on long run educational and economic impacts of deworming suggests that this is the case. However, a recent meta-analysis by Taylor-Robinson et al. (2015) (hereafter TMSDG), disputes these findings. The authors conclude that while treatment of children known to be infected increases weight by 0.75 kg (95% CI: 0.24, 1.26; p=0.0038), there is substantial evidence that MDA has no impact on weight or other child outcomes. We update the TMSDG analysis by including studies omitted from that analysis and extracting additional data from included studies, such as deriving standard errors from p-values when the standard errors are not reported in the original article. The updated sample includes twice as many trials as analyzed by TMSDG, substantially improving statistical power. We find that the TMSDG analysis is underpowered: it would conclude that MDA has no effect even if the true effect were (1) large enough to be cost-effective relative to other interventions in similar populations, or (2) of a size that is consistent with results from studies of children known to be infected.
…Applying either of two study classification approaches used in previous Cochrane Reviews (prior to TMSDG) also leads to rejection at the 5% level.
…Under-powered meta-analyses (such as TMSDG) are common in health research, and this methodological issue will be increasingly important as growing numbers of economists and other social scientists conduct meta-analysis.

A new NBER paper by Croke, Hicks, Hsu, Kremer, and Miguel (ungated version here).

I think the most confident prediction that can be made is this: a new meta-analysis by TMSDG or someone will argue something different, a response analysis will find statistical flaws, and so on, until it doesn’t matter anymore.

I say “until it doesn’t matter anymore” because I recently spoke to a large charitable organization who basically said, “yes we could fund more long term studies looking at the economic impacts of deworming, but by the time the results arrived in 10 years it wouldn’t matter anymore because we’re probably going to eliminate worms in those areas in 10 years anyways.” Hm.

At current publishing speeds, we can expect at least 10 more battling meta-analyses before then. I’m sitting on the sidelines from here on out. Pass the popcorn.

This is how to recognize bad science


Rather, my message is that this noisy, N = 41, between-person study never had a chance. The researchers presumably thought they were doing solid science, but actually they’re trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down.

Andrew Gelman critiquing a Kahneman psychology study. Via one of my favorite blogs.

Health, psychology, and exercise studies are, in my experience, the worst kangaroo-jumping feather-weighers.

One of the most important things you can do as a consumer of science is also the simplest: check the N. If you are reading a peppy New York Times study about how coffee makes you live longer, or exercise doesn’t help you lose weight, more often than not the study had 30 subjects and you can simply ignore the information you just received.

If you want to go a little bit further, you could also say “is this the only outcome I care about?”

IPA’s weekly links

Guest post by Jeff Mosenkis of Innovations for Poverty Action.

  • Nudgers dreams have come true, iOS 10 will feature organ donor registration, putting behavioral economists’ favorite outcome in people’s pockets.
  • The excellent TinySpark podcast interviews NPR investigative reporter Laura Sullivan, who with ProPublica’s Justin Elliot, exposed massive waste and inefficiency at the Red Cross, which diverted emergency vehicles from Sandy to press conferences, and drove empty trucks around neighborhoods for show. It also raised half a billion dollars for Haiti, but only built 6 homes. Sullivan contrasts the Red Cross’ Haiti approach to Doctors Without Borders, which asked donors to stop donating when they had reached full capacity and more funding wouldn’t help.
  • Why Do So Many Graduate Students Quit? (The Atlantic) is worth reading for people considering a Ph.D.

The work was delicate: Skulls can fracture. The earth shifts. Move the dirt too roughly, and it swallows bones into its folds and mixes them with other bodies. An errant stroke can brush away a remnant of a blindfold, a piece of rope, a cranium fragment with a bullet hole, the bullet itself: the criminal evidence needed to prosecute a murder.

  • The above from a profile of forensic anthropologist Fredy Peccerelli who has spent 20 years in Guatemala recovering over 10,000 people’s remains and documenting evidence of war crimes.
  • The Foreign Aid Transparency and Accountability Act has finally made it through Congress and is on its way to President Obama’s desk for signature. Among other things, it will require rigorous outcome-focused evaluations of many USAID programs.

And if you’re stressed at work, perhaps the most relaxing live stream is literally a live stream with bears catching salmon in Alaska (full page with more views here, h/t Jeffrey Davis).

The wrong story about Trump

She sagged suddenly with terror, imagining what would happen if Donald actually won. Everything would change. Her contentment would crack into pieces. The relentless intrusions into their lives; those horrible media people who never gave Donald any credit would get even worse. She had never questioned Donald’s dreams because they did not collide with her need for peace. Only once, when he was angry about something to do with his TV show, and abruptly decided to leave her and Barron in Paris and go back to New York, she had asked him quietly, “When will it be enough?” She had been rubbing her caviar cream on Barron’s cheeks — he was about 6 then — and Donald ignored her question and said, “Keep doing that and you’ll turn that kid into a sissy.

The New York Times is commissioning short stories about the US election, and that is an except from Chimamanda Ngozi Adichie’s contribution.

A parody of Trump is, I admit, satisfying. Adichie knows what the Times audience wants, and she delivers it well. But as I read the story, I couldn’t help but think that it’s that smugness that makes half the country hate the Times audience and want to vote for a man like Trump.

Adichie has a fantastic book of short stories that skewers Nigerian elites. Wouldn’t a much more skillful, better short story have made us see Trump in a more sympathetic light? The so-called liberals of New York (like me) who push for equal rights with one hand while pushing their kids to private schools with the other. Or support more open borders on principle, failing to mention that it lowers the cost of their house help without threatening their own jobs.

“Politics is the field of unintended consequences”

Thomas Nides, former deputy secretary of state under Clinton, offers a perfect summation of the creed (h/t Doug Henwood): Hillary Clinton understands we always need to change — but change that doesn’t cause unintended consequences for the average American. Off the top of my head, here’s a brief list of changes that caused unintended consequences for the average American (whoever that might be): The election of Abraham Lincoln. The passage of Social Security. The entrance of women into factories during World War II. Brown v. Board of Ed. Civil Rights Act of 1964 Asking an unknown state senator from Illinois to deliver the keynote address at the 2004 Democratic Party convention. Politics is the field of unintended consequences (“Events, dear boy, events.”) Don’t like unintended […]

That is Corey Robin, who has a very interesting blog. Keeping it old school like me.

I agree with Nides, the only difference being that I can’t think of someone in government any different. Someone said to me recently that yes, there are unintended consequences, and so politics is about letting the chips fall where they may and then leaving enough flexibility to control the damage afterwards. Maybe one lesson of the last decade is that the US government is bad at damage control.

And I thought research transparency was bad in economics and political science

Paleoanthropologists were excited by the Malapa discovery, but many were skeptical about Berger’s bold evolutionary claims. To some, he had long seemed more interested in fame than in careful science, and his press conference struck them as theatrical and unscholarly. Yet any scientist who wanted to vet his sediba research could do so: Berger shared his data and declared the fossils available for outside study, something that paleoanthropologists traditionally had not done. Ian Tattersall, a paleoanthropologist at the American Museum of Natural History, has said that the field often resembles “a swamp of ego, paranoia, possessiveness, and intellectual mercantilism.”

Berger donated replicas of the Malapa bones to museums and schools, and started attending conferences with a sediba cast, allowing anyone to inspect it. Jeremy DeSilva, a Dartmouth paleoanthropologist who collaborates with Berger, recalls that when he visited Wits in 2009 Berger offered to open the fossil vault. “A lot of people in our business are petrified to be wrong,” DeSilva told me. “You have to be willing to be wrong. What Lee is doing takes that to another level.”

From a profile of Lee Berger in the New Yorker. What makes this more amazing and galling is just how few artifacts are out there:

In the century and a half during which scientists have been formally studying humankind’s earliest ancestry, they’ve found fossil remains of only about six thousand individuals. Most have been fragments and isolated finds. Donald Johanson, who is now seventy-two, has said that before he found Lucy all of the hominid fossils older than three million years could “fit in the palm of your hand.”

The full article is fascinating because it describes perfectly the tension between extremely careful scholars who take great care, and often many years, to produce findings, and scholars who have less patience for the diligent, tiresome, work of getting something exactly right.

Berger is a self-promoter and hasty scholar by any standard. Maybe my favorite example (emphasis mine):

The dig, in November, 2013, lasted three weeks; a smaller dig followed in March, 2014. National Geographic live-blogged and tweeted the latest developments. Viewers watched the team recover bag after bag of remains—some fifteen hundred fossil elements, an unprecedented assemblage.

A dig is less than half the job. Scholars say, “It’s not what you find—it’s what you find out.” To analyze the fossils, Berger again turned to Facebook, inviting “early career” scientists to apply for a six-week workshop, in May, 2014. He promised that, together, they would describe the fossils for “high-impact publications.” By the end of that August—an extraordinarily fast turnaround by traditional standards—Berger had submitted twelve papers to Nature.

Every field has these tensions. The professors who make the most boisterous and careless claims get a lot of media attention. Furious, diligent scholars rush to counter the more ridiculous claims.

Even though I count myself among the more careful and diligent class, I see a lot of value to the Bergers of the discipline. They are often quite creative. They open up new fields and discussions. Other scholars rush to fill the new territory, however furious and angry. The questions get debated by a broader range of people. And the usually quiet scholars now have incentives to try to dispel the public’s worst misunderstandings. The march of progress?

What I’ve been reading

  1. Dear Committee Members, by Julie Schumacher. A story told entirely through letters of recommendation, each written by a cynical, funny, arrogant, self-destructive English professor. As a writing gimmick it works surprisingly well.
  2. Straight Man, by Richard Russo. The unraveling of a cynical, funny, arrogant, self-destructive English professor. (One begins to wonder if there is any other kind.) I kept thinking “this character is over the top”, and moments later would be reminded of past faculty meetings. (And up to now I never even got invited to the really interesting ones!) Essential reading for academics.
  3. The Innocent Have Nothing To Fear, by Stuart Stevens. The author, a former political strategist for Bush and Romney, keeps getting called “oddly prescient” for writing a novel about Trump vs Clinton before Trump actually looked like a real candidate. A short and entertaining novel, and interesting as an psuedo-anthropology of life inside a political campaign.
  4. Nature’s Metropolis: Chicago and the Great West, by William Cronin. I am reading this for obvious reasons.It’s fascinating to see the development of today’s financial markets and US development through the waxing and waning of various commodities in Chicago markets. Like so many history books, it is hard to flip through, and tedious to read closely. So you need to be committed to the topic.
  5. Encountering Poverty: Thinking and Acting in an Unequal World, by Ananya Roy and coauthors. A radical left take on the promise and pitfalls anti-poverty work. It’s full of interesting ideas and one of these days I will get around to blogging bits. It is worth perusing.
  6. The Lives of Tao, by Wesley Chu. Invasion of the Body Snatchers as a spy thriller. Pulpy and fun. After reading it I felt the same as after watching the latest Bond movie: entertained but unconvinced that was worth a few hours of my life.

IPA’s weekly links

Guest post by Jeff Mosenkis of Innovations for Poverty Action.


  • Great links from David McKenzie on the Development Impact blog this week,  including a guide to mobile phone panel survey methods in the developing world.
  • If you want some beach reading this weekend, Vox’s Dylan Matthews had a feature article looking at why well-intentioned Clinton-era welfare reform failed at helping fix poverty, but became a political model for how to shrink a government program. If you’re on the go, there’s also a nice group discussion on The Weeds podcast (or iTunes, the “Welfare Reform” episode).
  • Some other interesting articles, via Rachel Strohm:
  • Paper: Ru & Schoar suggest credit card companies may be screening for behavioral biases, offering the types of cards with more backloaded and hidden fees to less educated consumers, while the types of cards marketed to more educated customers (such as those carrying airline mile rewards), have more straightforward terms (ungated version here).
  • Harvard student Serena Booth found 19 percent of passers-by there let a robot into a locked building, that number went up to 76 percent when the robot was carrying a box of cookies.

“A Family-Friendly Policy That’s Friendliest to Male Professors”

That title from Justin Wolfers’s article in the The New York Times:

The central problem is that employment policies that are gender-neutral on paper may not be gender-neutral in effect. After all, most women receive parental benefits only after bearing the burden of pregnancy, childbirth, nursing, and often, a larger share of parenting responsibilities. Yet fathers usually receive the same benefits without bearing anything close to the same burden. Given this asymmetry, it’s little wonder some recently instituted benefits have given men an advantage.

I am sure there is a spirited discussion on social media, but with my new abstinence policy, I have no idea. I would probably make better points if I saw it. My only consolation (sort of) is that I will not see any blowback from my own thoughts.

The basic premise strikes me as true–men benefit more career-wise. But with caveats. Because the few months after the birth does not tenure make or break. Parenting is permanent.

There’s not a day goes by that I am not grateful for my own parental leave: one course off at Yale for each of the two kids, plus a smidgen less administrative work that year. I’d say the (slim?) majority of my peers going up for tenure do not have children. Having kids (and being dedicated to them) cut 20 hours out of my work week from the day they arrived. I don’t care how much more productive I became after kids, trying to fit everything into 9am-5pm and 9-11pm just five days a week: when aggregate inputs went down, so did aggregate outputs.

This to me is the big “time shock”: not the few months after the birth, but the compressed work hours forevermore, especially when you are compared explicitly at tenure time against childless colleagues who can and often do put in much, much more time. On average across families, women bear an unequal burden here. That’s not true of my family. But if it’s true for the average female assistant professor, then this could be a bigger disadvantage she faces, one that isn’t solved with a maternity leave policy.

That said, the parental leave right after birth matters too. Jeannie wanted to (and did) take off more time than me after the births. I know a lot of female assistant professor colleagues who did the same, and only a few male ones. (Also, let’s not forget who had the physical ordeal here.)

My sense is that most universities have an all-or-nothing parental leave policy that doesn’t quite fit different people’s priorities, or the unequal burden by gender (even if, in some relationships, it’s just the child-carrying itself).

As a profession, I think we ought to make sure that assistant professors who have kids get a break. Maybe a course off, with the understanding they are still active department members. And those who want to take serious time off after the birth could get a more serious break, such as more course relief and a tenure clock extension. Personally I’d support a policy that systematically favored women here, for longer than a semester.

All-or-nothing, gender-blind, one-semester long leave policies bear no resemblance to the demands of parenting, and the tensions with the tenure system.

IPA’s weekly links

Guest post by Jeff Mosenkis of Innovations for Poverty Action.

  • Alicia Munnell, a Harvard-trained economist who studies retirement policy, worked for the Federal Reserve Bank of Boston, was Assistant Secretary of the Treasury and served on the president’s Council of Economic Advisors realized she hadn’t saved sufficiently for retirement. Harvard behavioral economist Sendhil Mullainathan recently confessed to similar economic sins. (h/t Jason Zweig)
  • Somebody put a compendium of Trump speech text content on GitHub for your analysis pleasure.
  • Gosnell, List, & Metcalfe experimented on 335 pilots of 40,000 Virgin Atlantic flights and saved a lot of fuel and pollution by encouraging them to adopt more efficient practices. Experimental treatments included providing feedback on their fuel efficiency, setting personalized targets with “Well Done” messages for achieving them, and having money donated to charity for achieving the goals:

We estimate that our treatments saved between 266,000-704,000 kg of fuel for the airline over the eight-month experimental period. These savings led to between 838,000-2.22 million kg of CO2 abated at a marginal abatement cost of negative $250 per ton of CO2 (i.e. a $250 savings per ton abated) over the eight-month experimental period.


Which is great, but if I’m reading it right, just the Hawthorne effect of the pilots knowing they were being observed was much stronger, saving 6.8 million kg of fuel ($5.3 million) over the eight-month study period. (h/t Alexander Berger)

  • Summaries of 18 papers on improving education systems from the RISE conference.
  • A new report questions the Broken Windows theory of policing, the idea that cracking down on minor crimes (like graffiti and littering), also reduces major crime. The idea started with speculation by two sociologists in a 1982 Atlantic article. Supported by a probably spurious correlation (many things got better in the 90s), and popularized in Gladwell’s The Tipping Point and elsewhere, it became pop-sociology, then eventually police policy. The New York Police Department Inspector General report (PDF) concludes that while the 2010-2015 NYPD crackdown on these types of crimes led to many more arrests in minority neighborhoods with no impact on major crime.
  • On a related note, J-PAL North America announced they will be working with five U.S. state and city governments to actually test policies before implementing them.

And if you want to give a TED Talk here’s how to be a thought leader (h/t Lindsey Shaughnessy).

This is the novel of the next world war, and it’s great

1313099932668866964Finally someone besides Todd Moss has combined social science with pulpy beach-reading thrillers. Suresh Naidu turned me onto P.W. Singer and August Cole’s Ghost Fleet during one of our morning runs, and you should think of this as our combined book review.

In short, China attacks the U.S. Anything past this point is a minor spoiler. If you don’t want to hear more, then simply know that the book was good fun and more thought-provoking than any security paper I’ve read in a long time. So I say buy it.

First you should know that some war bloggers hate it. Here are Noel Maurer‘s many posts, which mostly raise technical and technological complaints.

Nonetheless, some of my favorite insights, some of which come from Suresh:

  • Economists who think about the pros and cons of globalization and trade have not even begun to think about the security implications of their policies.
  • The inherent superiority of dumb warfare when smart weaponry becomes too good.
  • Walmart is a weapon of mass destruction.
  • The maxim “always take the high ground” means orbit, at least.
  • Sadly there were no Chinese political scientists and economists run counter-insurgency randomized trials.
  • It would have been a better book if there were a moral justification for Chinese aggression that made an average American see America from the outside, uncharitably.

Personally I don’t know the authors’ politics. I only know Singer from his early books on child soldiers, before he got into writing about the U.S. military. But, intentional or not, the book strikes me as excellent nationalist propaganda. Even a liberal idealist like me found myself sneering at NATO, offshoring, and those Chinese devils. It is possible you will be subtly turned on to a President Trump. You have been warned.

Reflections after a week of no social media

I have been Twitter, Facebook, and Reddit free for a week now. I’m mainly happy with the decision, at least so far. I stare at my phone less. Not THAT much less, because the emails are endless and I can always plumb the depths of the New York Times app. But I do feel like I’m doing less frivolous browsing.

One drawback is that my news sources are seriously curtailed. I like the times but I need more breadth. I will consider subscribing to a few more blog and news feeds by email. I’m reading more books and magazines, but only a little. But this week was not representative: we put an offer in on a house, and that has eaten up enormous amounts of time, simply in learning how it all works. (I’m 41 and, until now, the most expensive thing I ever bought was a Honda Fit.)

All in all, I feel a little more focused and I don’t miss the info flow all that much.

Is there a methodological war in development economics?

Following my post on misleading methodological wars in political science this morning, I saw for the first time David McKenzie’s blog post on whether randomized control trials (RCTs) have taken over development economics:

Another claim is that the “best and brightest talent of a generation of development economists been devoted to producing rigorous impact evaluations” about topics which are easy to randomize  and that they take a “randomize or bust” attitude whereby they turn down many interesting research questions if they can’t randomize

To explore this, I examined the publication records of the 65 BREAD affiliates (this is the group of more junior members), restricting attention to the 53 researchers who had graduated in 2011 or earlier (to give them time to have published). The median researcher had published 9 papers, and the median share of their papers which were RCTs was 13 percent. Focusing on the subset of those who have published at least one RCT, the mean (median) percent of their published papers that are RCTs is 35 percent (30 percent), and the 10-90 range is 11 to 60 percent. So young researchers who publish RCTs also do write and publish papers that are not RCTs. Indeed this is also true of Esther and her co-authors on this paper (Abhijit Banerjee and Michael Kremer) – although known as the leaders of the “randomista” movement, the top-cited papers of all three researchers are not RCTs.

And as for journals:

RCTs are a much higher proportion of the development papers published in general interest journals than in development journals. However, even in these journals they are the minority of development papers – there are more non-RCT development papers than RCTs even in these general journals. Moreover, since most of the development papers are published in field journals, RCTs are a small percentage of all development research: out of the 454 development papers published in these 14 journals in 2015, only 44 are RCTs (and this included a couple of lab-in-the-field experiments). As a result,  policymakers looking for non-RCT evidence have no shortage of research to choose from.

Read the full post. Here is a graph.


David was responding to Esther Duflo’s talk on the subject.

See my comments on why I think you can explain these trends with normal responses to technological change. Does this mean we have reached peak RCT? I think so.

Hat tip to the excellent FAI newsletter, to which I subscribe..

The rumored methodological wars in political science are not the wars actually being fought

From the position of a political scientist, I commonly hear say, historians or anthropologists summarize what they understand political scientists to believe. Having done a fair bit of participant observation within the tribe of the tsitneics-lacitilop, those descriptions are often frustrating, describing something akin to what I understand were debates within the discipline during the 1990s. It is now 2016.

Personal frustrations aside, such outdated or erroneous views of what “political scientists believe or argue about” are problematic for a couple of more general reasons. For one, they may stand in the way of interdisciplinary collaboration by proposing that political scientists do not study certain things or work in certain ways. They also encourage fence-building between disciplines, by portraying disciplines as having settled debates, doing work that is essentially uninteresting to those elsewhere.

…The most common misconception that I encounter is that political science is divided along a cleavage of quantitative scholars and rational choice theorists versus qualitative or historical scholars. The errors here are two. First, this view lumps together “rational choice theory” with quantitative methodology, which both mistakenly equates theory and methodology and misses that some of the strongest critiques of rationalism in political science come from a quantitative behavioral origin (and vice versa). Second, it misses the extent to which quantitative methods are used in service of historical arguments, and the extent to which rationalist arguments are frequently grounded in qualitative insights. There is probably much more to write on this, but the idea of a discipline characterized by this singular cleavage on this particular axis always makes me cringe.

That is Cornell professor/blogger Tom Pepinsky. Hat tip to Ken Opalo. He goes on to argue that some of the most intense debates are within quantitative political science right now, including:

  1. Is it better to do statistics right, or not at all? It seems easy to conclude that of course we should only do statistics the right way, but if the standards for correct are formidably high, are we prepared to abandon whole areas of inquiry as unstudy-able? There exist quantitative political scientists who believe that we should basically never run cross-national regressions, for example.

  2. Experiments versus observational data. Experiments offer control, but almost always sacrificed realism in service of that control. What is the optimal balance between the two? On what terms should we make tradeoffs between the two?

  3. Microfoundations and macrostructures. Regardless of whether data is observational or experimental, research designs tend to be more straightforward with micro-level data than with aggregate or macro-level data. The problem of reconciling micro-level evidence (what individuals say or do) with macro-level phenomena (how institutions, countries, policies, and/or international systems work) will be, I suspect, one of the core issues that political scientists confront over the next decade.

I would add that I think a lot of disciplinary struggles have to do with the way that innovations in research lead, initially, to a lot of crowding into the new area, combined with the innovators getting a little exuberant with their claims.

For instance, the huge burst of formal theory came on the heels of some innovations in how to write mathematical models (and people with the training to do so).

The waning of formal theory and the waxing of empirical work started with the advent of computers and statistical software, meaning large data could be analyzed cheaply for the first time. This meant high returns to collecting new data.

The fetish for causal identification grew were bolstered by innovations in how to do so, such as instruments and regression discontinuity. Once people showed how you could run field experiments in social science, or labs, and built the institutions and the donor base to implement them, you saw a lot of growth there.

In each case you get a rush of followers, especially among junior professors and graduate students. So it can feel like the profession is marginalizing what others do. Especially when the leaders of the movement, or their followers, make very grandiose claims that seem plausible, for a short while.

I’m not sure what the next technological innovations will be.

  • Maybe in text analysis. This could increase the returns to collecting qualitative data.
  • Psychology seems underutilized. Behavioral economics, for example, focuses on a small number of biases and rules of thumb and hasn’t plumbed the depths of emotion, social identity, and so forth. You’re seeing more work on the formation and effects of personality. Political science is a little further along here but not much, and on in US politics.
  • Maybe there will be neuroscience insights but I don’t see it.
  • Other ideas?

I wonder if productivity improvements in qualitative research are disadvantaged by current  trends. Do we have quant-biased technological change? What would the switch look like when eventually comes? Because the smart social scientist always bets on regression to the mean.

IPA’s weekly links

Guest Post by Jeff Mosenkis of Innovations for Poverty Action.



  • A new working paper suggests the infamous Tuskegee syphilis experiments on African-American men may hurt even more people by damaging trust in the medical system. Using GSS data, Marcella Alsan & Marianne Wanamaker conclude that following the 1972 public revelation of the study, fewer African-American men saw doctors, and this shortened their lives significantly:


    Our estimates imply life expectancy at age 45 for black men fell by up to 1.4 years in response to the disclosure, accounting for approximately 35% of the 1980 life expectancy gap between black and white men.

    Some important caveats: it appears only for one subgroup, and the paper hasn’t been fully reviewed yet. h/t Rachel Strohm via The Science of Us.

  • Markus Goldstein and David Evans summarize 25 papers on urbanization in Africa.
  • Near the end of Paul Ryan’s anti-poverty plan is a rollback of the new “fiduciary rule” that starting in 2018 financial advisers have to act in their clients’ best interest rather than their own. Why can’t consumers decide for themselves if the financial advice is good? From the NYTimes:

So the Certified Financial Planner Board of Standards hired a professional D.J. named Azmyth Kaminski, shaved off his dreadlocks, removed his body piercings and put him in a suit. It taught him a few financial phrases and sat him in a conference room. Then it brought in people looking for a financial adviser.

“We gave him buzzwords, like ‘401(k) is the way to go,’” said Joe Maugeri, managing director for corporate relations at the CFP Board. “I talked to him about 529 plans and he said, ‘All 56 states have 529 plans?’ I said, ‘Well, yes, all 50 of them have them.’ He was a real nice guy.”

So how did he do? After Mr. Kaminski spent about 15 minutes with each person, all but one were ready to work with him, Mr. Maugeri said.

  • Mathematica Policy Research’s new free RCT-YES software package is designed to make it easier to analyze RCT data for those who might not be code experts. I haven’t used it, but it looks like it’s a layer on top of R or STATA that takes care of the code and nicely formats the output. (h/t David Batcheck)
  • Facebook, which has gotten in trouble over conducting experiments on users (involving for example, manipulating emotions through news feeds), published a law review article on revamping their IRB process.
  • There’s a recent working paper from Canice Prendergast who helped a network of food banks solve supply/demand problems. The Planet Money Econ Talk  podcasts explain how a group who literally gives out free lunches got help from a UChicago economist.

Photo above: New York Times/CFP Board