- The madness of elite airline status (first world 1% problems)
- Should English be the only official language of the EU?
- Tim Burton confirms Beetlejuice 2 is a go
- A nice example of classy replication/criticism and a classy response in Science: Whether or not children learn from math and reading apps
- The Freakonomics podcast featuring my Liberia crime-reduction-through-therapy study is airing this weekend nationwide on NPR
- Via @JustinWolfers, this Wonkblog graph:
The fifth annual VancouFur convention, in which people dress up as fictional anthropomorphic animal characters with human personalities and characteristics, was held at the same hotel where a number of Syrian refugees are currently being housed.
A message was given to all attendees at the convention that the hotel had been chosen as one of the temporary housing locations for the Syrian refugees in Canada, and that “a major concern that VancouFur has is ensuring that each and every one of the refugees (and attendees) feels welcome and safe and the fact that this is likely to be a major shock to them”.
Also, a good article on why the EU-Turkey deal on refugees is doomed (hat tip Tyler Cowen)
And finally, for the non-furries like me, I prefer these t-shirts:
- Following psychology’s current “repligate” and econ’s Worm Wars, I wrote a guide to how to read “debunking” news stories (including the Wu-Tang Clan rule).
- You can almost hear the disappointment in the American Statistical Association’s reminder for scientists on proper use of p-values.
- The new What Went Wrong Foundation will document development interventions that failed and ask local beneficiaries what happened.
- A paper finds football players exposed to Sierra Leone’s war violence were more altruistic in lab experiments but also got more fouls in street games. The researchers hypothesize that being put into a group competitiveness is the key. (h/t Justin Sandefur).
- Some podcasts I’ve enjoyed:
- The new Surprisingly Awesome has Adam Davidson and John Hodgeman in a debate over whether chasing airline miles is rational. Some highlights include the story of the man who traveled the world for a decade thanks to pudding, and a great discussion at the end over rational choice vs. behavioral econ.
- Tiny Spark about development is short and to the point, Bill Easterly and Owen Barder have been on recently, but I also enjoyed the former head of Charity Navigator talking about how orgs without a ton of data expertise are being pressured to come up with effectiveness numbers, which is a recipe for disaster.
- Similarly paced is Government Innovator. If you have a child or a stomach, the one on nudges for healthier lunchrooms had some great tips. Dean Karlan also clarified what RCTs can’t do.
- Markus Goldstein reports on a new review of what works in agriculture, that even with the bar set low:
They found 18,470 citations, but only 19(!) of these are viable studies.
- In Tanzania, “Female Food Heroes” is a reality show highlighting women farmers.
And via Maude Lachaine, the R package Mansplainer:
I give up.
A few days ago I posted about the psychology replication study that didn’t replicate. Apparently the replication of the replication is quite problematic.
Here is a discussion from Replication Watch, which notes
independent commentaries have also emerged challenging Gilbert and colleagues’ methodology and conclusion by Sanjay Srivastava, Uri Simonsohn, Daniel Lakens, Simine Vazire, Andrew Gelman, David Funder, Rolf Zwaan, and Dorothy Bishop.
I’m sure this is a fascinating and important debate but I have not had time to read these in depth, as I am barely able to answer my emails.
I would love pointers to a simple and relatively unbiased roundup. In the meantime, I felt like I should at least note the controversy.
Governments play a central role in facilitating economic development. Yet while economists have long emphasized the importance of government quality, historically they have paid less attention to the internal workings of the state and the individuals who provide the public services.
This paper reviews a nascent but growing body of field experiments that explores the personnel economics of the state.
To place the experimental findings in context, we begin by documenting some stylized facts about how public sector employment differs from that in the private sector. In particular, we show that in most countries throughout the world, public sector employees enjoy a significant wage premium over their private sector counterparts.
Moreover, this wage gap is largest among low-income countries, which tends to be precisely where governance issues are most severe. These differences in pay, together with significant information asymmetries within government organizations in low-income countries, provide a prima facie rationale for the emphasis of the recent field experiments on three aspects of the state–employee relationship: selection, incentive structures, and monitoring. We review the findings on all three dimensions and then conclude this survey with directions for future research.
A new paper titled “The Personnel Economics of the State”, Fred Finan, Ben Olken, and Rohini Pande. This confirms my belief that bureaucracy-building will be one of the most important topics of the next decade.
But I guess bureaucracy sounded too sexy so they decided to go with “Personnel Economics of the State”.
The photo is from an amazing series call Bureaucratics by Jan Banning.
Guest post by Jeff Mosenkis of Innovations for Poverty Action.
- Tina Rosenberg asks in the New York Times Fixes column why the development world is so obsessed with innovation rather than spreading existing good ideas. She uses the example of the HIV-prevention org Young1ove, started by an MIT student who read about a promising RCT.
- Some resources:
- J-PAL has a guide to doing randomized controlled trials using administrative data. They’ve also put a ton of research resources including guides on how to randomize and sample code here.
- IPA’s new “Goldilocks” toolkit written with non-technical orgs in mind explains that RCTs aren’t always the best solution and gives resources for helping orgs figure out what data they should be collecting to monitor impact.
- As Chris posted, psychologists are arguing over whether there is a reproducibility crisis (you can get into the weeds here), but meanwhile, there’s a new Science paper on reproducibility of 18 econ papers published in AER and QJE:
We find a significant effect in the same direction as the original study for 11 replications (61%); on average the replicated effect size is 66% of the original. The reproducibility rate varies between 67% and 78% for four additional reproducibility indicators, including a prediction market measure of peer beliefs.
- In that vein, friends of ours are collecting case studies of data sharing through the Mozilla Science Foundation, submit your stories by March 10th.
- A lot of misinformation and scare stories are spreading surrounding refugees in Europe, but there are also interesting attempts to correct them. The Hoaxmap is a Snopes for scare stores about crimes supposedly committed by refugees, while the Mediterranean Rumor Tracker tries to find rumors spreading among refugees (such as which border crossings are open when) and correct them.
- Samantha Bee’s visit to a refugee camp in Jordan was pretty good.
- David Evans says baseline data is useful for other purposes (we’ve found they can be hugely beneficial to local policymakers).
- Panel data from India (PDF) suggests monitoring teachers to make sure they show up for work is 10 times more cost effective at lowering student:teacher ratio as hiring more teachers.
And, from SMBC:
Isn’t it ironic?
A recent article by the Open Science Collaboration (a group of 270 coauthors) gained considerable academic and public attention due to its sensational conclusion that the replicability of psychological science is surprisingly low. Science magazine lauded this article as one of the top 10 scientific breakthroughs of the year across all fields of science, reports of which appeared on the front pages of newspapers worldwide.
We show that OSC’s article contains three major statistical errors and, when corrected, provides no evidence of a replication crisis.
Indeed, the evidence is consistent with the opposite conclusion — that the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%. The moral of the story is that meta-science must follow the rules of science.
A new article by Gilbert, King, Pettigrew, and Wilson.
This excerpt is rather amazing:
For example, many of OSC’s replication studies drew their samples from different populations than the original studies did. An original study that measured American’s attitudes toward African Americans was replicated with Italians, who do not share the same stereotypes;
an original study that asked college students to imagine being called on by a professor was replicated with participants who had never been to college;
and an original study that asked students who commute to school to choose between apartments that were short and long drives from campus was replicated with students who do not commute to school.
What’s more, many of OSC’s replication studies used procedures that differed from the original study’s procedures in substantial ways: An original study that asked Israelis to imagine the consequences of military service was replicated by asking Americans to imagine the consequences of a honeymoon;
an original study that gave younger children the difficult task of locating targets on a large screen was replicated by giving older children the easier task of locating targets on a small screen;
an original study that showed how a change in the wording of a charitable appeal sent by mail to Koreans could boost response rates was replicated by sending 771,408 e-mail messages to people all over the world (which produced a response rate of essentially zero in all conditions).
While the fact that this was detected and reported on quickly is (in some sense) evidence of science working, it seems that something is broken about the review and referee process that the original paper got through the publication process.
Update: The original group of authors don’t like the replication of their replication. See Andrew Gelman’s excellent and skeptical discussion. I have read none of these closely enough to have an opinion.
I think I can hear the armies of angry Tiger moms amassing.
A large and growing literature has documented the importance of peer effects in education. However, there is relatively little evidence on the long-run educational and labor market consequences of childhood peers.
We examine this question by linking administrative data on elementary school students to subsequent test scores, college attendance and completion, and earnings.
To distinguish the effect of peers from confounding factors, we exploit the population variation in the proportion of children from families linked to domestic violence, who were shown by Carrell and Hoekstra (2010, 2012) to disrupt contemporaneous behavior and learning.
Results show that exposure to a disruptive peer in classes of 25 during elementary school reduces earnings at age 26 by 3 to 4 percent.
We estimate that differential exposure to children linked to domestic violence explains 5 to 6 percent of the rich-poor earnings gap in our data, and that removing one disruptive peer from a classroom for one year would raise the present discounted value of classmates’ future earnings by $100,000.
That is Joshua Browder, who made a free robot lawyer that has appealed $3 million in parking tickets in the UK.
Since laws are publicly available, bots can automate some of the simple tasks that human lawyers have had to do for centuries. Browder’s isn’t even the first lawyer bot. The startup Acadmx’s bot creates perfectly formatted legal briefs. The company Lex Machina does data mining on judges’ records and makes predictions on what they will do in the future.
Beyond parking tickets, Browder’s bot can also help with delayed or canceled flights and payment-protection insurance (PPI) claims. Although the bot can only help file claims on simple legal issues — it can’t physically argue a case in front of a judge — it can save users a lot of money.
If one were to create a dime-sized hole between thumb and forefinger and hold it out at arm’s length, in that small region the largest telescopes today, like those in Chile or Hawaii, could discern literally hundreds of thousands of other galaxies like our own Milky Way.
An NYRB article that I not only found humbling, but made me think we do not invest enough tim and money as a society in physics and astronomy.
…Since Edwin Hubble’s groundbreaking discovery in 1929 that the universe is expanding, we have recognized that the entire observable universe, all 100 billion or so galaxies, each containing 100 billion or so stars, was, some 13.8 billion years ago, confined to a region that was perhaps smaller than a single atom today. If this is the case, then the initial conditions that determined the origin, makeup, and nature of the largest cosmic objects today were determined on subatomic scales. So to understand the universe on its largest scales we ultimately must push forward our understanding of the fundamental structure of matter and forces on the smallest scales.
“I’m just checking in.” = Where is that thing you promised I’d have by now?
“Sorry to bother you again.” = Why can’t you do your fucking job?
“I feel bad for making you do this.” = You should feel bad for not having done this already.
“This was helpful.” = This would’ve been helpful two weeks ago.
“Sorry if I somehow missed your email.” = We both know you never emailed me.
“Thanks for the explanation.” = If you had told me this last week, you would’ve saved me a lot of time.
“Perhaps there was a misunderstanding.” = You didn’t fucking listen to me.
More at McSweeneys. Hat tip to John Thorne.
There’s inequality of opportunity.
In the pop version of income inequality, perpetuating the gulf between haves and have-nots, the most successful performers get first dibs on the hottest producers and songwriters of the moment. Yet while singers come and go, an oligarchy of producers endures.
And pin-factory-like specialization and division of labor:
…Potential hits may also be assembled in high-pressure pop think tanks called “writer camps,” where a deep-pocketed star—Beyoncé, perhaps—convenes dozens of producers, composers, and lyricists in hotels and studios, where they run through every permutation of producers and topliners. The campers are often a mix of longtime pros and newcomers from the hipper fringes, sharing their innovations or eccentricities for the chance at a pop payoff. “Camp counselors” schedule teams to come up with a song before lunch, then reshuffle the teams to come up with another afterward, with daily playbacks to keep everyone competitive. “If the artist happens to be present,” Seabrook writes, “the artist circulates among the different sessions, throwing out concepts, checking on the works in progress, picking up musical pollen in one session and shedding it on others.”
In a different, less physically proximate kind of songwriting contest, one very simple track—a single beat or a chord progression—gets sent simultaneously to dozens of potential collaborators. Then the producer and singer might choose a verse from one response, a chorus from another, an instrumental hook from yet another: digital brainstorming. Many of the songwriter-producers in the book are blunt about describing their work more as a business than a form of self-expression—though that may be more a matter of our era’s MBA mentality, combined with a hip-hop culture of competitive striving gone mainstream. The Beatles wanted hits, too.
Something must be done to combat this public health hazard. In 2000, the National Heart Lung, and Blood Institute (NHLBI) began requiring that researchers publicly register their research analysis plan before starting their clinical trials. From a new PLOS paper:
We identified all large NHLBI supported RCTs between 1970 and 2012 evaluating drugs or dietary supplements for the treatment or prevention of cardiovascular disease.
17 of 30 studies (57%) published prior to 2000 showed a significant benefit of intervention on the primary outcome in comparison to only 2 among the 25 (8%) trials published after 2000 (χ2=12.2,df= 1, p=0.0005). There has been no change in the proportion of trials that compared treatment to placebo versus active comparator. Industry co-sponsorship was unrelated to the probability of reporting a significant benefit. Pre-registration in clinical trials.gov was strongly associated with the trend toward null findings.
Hat tip @rlmcelreath.
This response seems to miss, or perhaps obscure, the point. In my understanding, Hausmann is suggesting that development organizations take a Toyota-style approach to innovation, in which front-line workers have authority to adapt, make suggestions, and eventually change the way the organization works. In this case, power to innovate lies in the front-lines, among implementers.
In contrast, Blattman seems to depart from the premise that high-level managers, or academics, are the ones authorized to have ideas, and these ideas are then transmitted to the fieldworkers who implement them. Thanks to rigorous testing, the best ideas can be disseminated. Power is centralized, and held by the proper authorities.
So the debate is not about methods or rigor, it is about authority to innovate and power to decide.
I agree, but in that case we both have it wrong.
To see this, imagine a Toyota that gives customers a car whether they like it or not. Front-line worker innovation might or might not develop cars that people want to own. The problem is there’s no vote or market test where the ultimate users decide.
I think this is called the Lada.
This is the fundamental problem of aid. A bunch of planners have the power to decide and are only accountable to donors, most of whom seem happy to remain ignorant of the details or the actual success of their interventions.
People have put ahead randomized trials as an improvement over the current mess (including donors who quite fairly don’t know any other way to make this better).
Now, trial-and-error innovation combined with randomized trails could be more powerful. That was my point. But make no mistake: both are still the tools of the inept planner.
Of course, none of these are reasons that social scientists like randomized trials. They are interested in using these field experiments to try to test ideas or estimate parameters, sometimes to produce general knowledge for the public good. Maybe an adaptive mechanism could do this even better.
To extend the metaphor, I think that means we academics who do field experiments are the intellectual wing of the Communist Party, who have captured Lada’s production for our own purposes, both selfish and noble. The term randomista sounds more appropriate than ever?
On the sister blog I report on a new paper, “Don’t Get Duped: Fraud through Duplication in Public Opinion Surveys,” by Noble Kuriakose, a researcher at SurveyMonkey, and Michael Robbins, a researcher at Princeton and the University of Michigan, who gathered data from “1,008 national surveys with more than 1.2 million observations, collected over a period of 35 years covering 154 countries, territories or subregions.”
They did some forensics, looking for duplicate or near-duplicate records as a sign of faked data, and they estimate that something like 20% of the surveys they studied had “substantial data falsification via duplication.”
These were serious surveys such as Afrobarometer, Arab Barometer, Americas Barometer, International Social Survey, Pew Global Attitudes, Pew Religion Project, Sadat Chair, and World Values Survey. To the extent these surveys are faked in many countries, we should really be questioning what we think we know about public opinion in these many countries.
That is Andrew Gelman.
At quick glance, the paper’s approach to calling data “duplicated” is a bit crude, but I’ve worked with several of the survey firms that have produced these surveys in Africa. I have no trouble imagining that the data have very serious problems.
Of course, if 20% of the surveys have a 5% duplication problem, I don’t know that this makes them much worse than the other data scholars use. Researchers use national statistics or cross-national databases all the time as if the data are valid, while most are terrible. When I referee a paper, it’s obvious who knows whats under the data and who never bother to look, and simply download blindly.
But back to the surveys. Duplication is terrible, but the least of my worries. For instance:
- The questions from most of these surveys sound perfectly sensible until you sit down and ask them to someone in a village. Then the absurdities become immediately apparent. Researchers: If you have a chance, print out any one of these surveys and test them out sometime. You will never operate the same again.
- Then there’s the poor quality of much of the legitimately-collected data from rushed, tired, poorly incentivized enumerators.
- Finally, these survey firms are for-profit enterprises with very different incentives and constraints than the researchers. They often have limited cash flow, middling middle management, and their average customer is a private firm or development agency that pays little attention to data quality.
If I want reliable data, mostly I do not use private survey firms. I hire and train teams myself (when I can through a local non-profit research organization or an international one like Innovations for Poverty Action). And if I must us a firm, I hire a researcher I trust to keep an eye on things full time. I recommend nothing else.
- Wild gorillas compose happy songs that they hum during meals
- “For the love of God, rich people, stop giving Stanford money.”
- For those of you aspiring to PhDs, Princeton is looking for applicants for its Emerging Scholars in Political Science program, with an emphasis on applicants from historically underrepresented groups.
- A history of randomized trials in social science
- What works in reducing community violence? A meta review
- I have a new curse: “Carry yourself with the confidence of Harvard students who believe they invented cabins.”
Guest post by Jeff Mosenkis of Innovations for Poverty Action.
- If your work involves transcribing audio or video, Trint looks interesting. It extracts the text and synchronizes the result with the audio/video file alongside it to make it easier to go through and make corrections (h/t Brian Boyer).
- UN Secretary-General Ban Ki-moon endorsed using cash transfers whenever possible in humanitarian crisis aid (h/t GiveDirectly).
- On the other hand when it comes to development in general, Angus Deaton reaffirmed at the Council on Foreign Relations that he’s just as skeptical of cash transfers as about other outside aid. He feels (as I understand it) that having outside governments and NGOs doing the jobs local governments should be doing undermines the contract between those governments and their own people. He also fears cash transfers, if they get big enough, will also attract the attention of greedy officials:
“…it’s the unintended consequences of what happens in the long run if you give people lots of money. And, you know, if there’s enough money out there, the guys that run the country, you know, they’re going to get it. I mean, that’s what happens. That’s the nature of power.”
- EconTalk has an interesting episode on “medical reversals.” Up to 40% of treatments that become common based on initial data turn out not to be helpful, and sometimes harmful, years later when the RCT data comes in. There’s a nice comparison about 45 min in between the dilemmas that health vs. financial regulators face.
- Among the Hillary Clinton emails made searchable by The Wall Street Journal, is a compelling one from Chelsea Clinton who went to observe in post-earthquake Haiti. After being there for a short time, she paints a pretty vivid portrait of disorganized NGO/UN efforts.
- American behavioral econ draws on the Kahneman and Tversky school of thought exploring flaws in human reasoning. The Bounded Rationality approach associated with Gerd Gigerenzer focuses on how the mental shortcuts we use are often adaptive. The Max Plank Institute in Berlin is hosting a summer institute for scientists interesting in learning more (h/t Decision Science News).
And after the uproar surrounding the American hunter who killed Cecil the lion, fewer hunters have come to a Zimbabwean park. It now has an overpopulation of lions and officials there are considering killing 200 of them.
Photo above via Flickr/brianscott
Sorry guys. I know you’re my main audience, but this is my generation’s new favorite pastime. And there is new material.
On Monday, the New York Times published a story about the breakfast favorite, and the most disconcerting part was this:
Almost 40 percent of the millennials surveyed by Mintel for its 2015 report said cereal was an inconvenient breakfast choice because they had to clean up after eating it.
The industry, the piece explained, is struggling — sales have tumbled by almost 30 percent over the past 15 years, and the future remains uncertain. …
A large contingent of millennials are uninterested in breakfast cereal because eating it means using a bowl, and bowls don’t clean themselves (or get tossed in the garbage). Bowls, kids these days groan, have to be cleaned. …
Via James Choi.
My main problem with RCTs is that they make us think about interventions, policies, and organizations in the wrong way. As opposed to the two or three designs that get tested slowly by RCTs (like putting tablets or flipcharts in schools), most social interventions have millions of design possibilities and outcomes depend on complex combinations between them. This leads to what the complexity scientist Stuart Kauffman calls a “rugged fitness landscape.”
Getting the right combination of parameters is critical. This requires that organizations implement evolutionary strategies that are based on trying things out and learning quickly about performance through rapid feedback loops, as suggested by Matt Andrews, Lant Pritchett and Michael Woolcock at Harvard’s Center for International Development.
RCTs may be appropriate for clinical drug trials. But for a remarkably broad array of policy areas, the RCT movement has had an impact equivalent to putting auditors in charge of the R&D department. That is the wrong way to design things that work. Only by creating organizations that learn how to learn, as so-called lean manufacturing has done for industry, can we accelerate progress.
That’s Harvard’s Ricardo Hausmann writing in Project Syndicate.
I had the following reactions:
- Absolutely, organizations should be doing innovating through rigorous trial and error. And the case needs to be made, since many organizations don’t know how to do this.
- But let’s be honest: most governments and NGOs did not have R&D departments that got hijacked by randomized trials. Most organizations I know were not doing much in the way of systematic or rigorous research of any kind. Outside one or two donors and development banks, the usual research result was a mediocre consulting report rigged to look good.
- In fact, most organizations I know have spent the majority of budgets on programs with no evidence whatsoever. In the realm of poverty alleviation, for example, it turns out that two of the favorites, vocational training and microfinance, have almost no effect on poverty.
- This goes to show that, without a market test, some kind of auditing or other mechanism is probably needed. Especially the money-wasting behemoths of programs that are still so common.
- Sometimes the answer will be large-scale randomized trials. The way I see it, trial-and-error-based innovation and clinical trials are complements not substitutes. Most of the successful studies I’ve run have followed a period of relatively informal trial-and-error.
- There are a few radicals in academia and aid who say everything should have a randomized trial, but I think the smart ones don’t really mean it, and the others I don’t take seriously. They are also the exception. If you look at the research agenda of most of the so-called randomistas, experiments are only a fraction of their work.
- In political science, the generation before me fought (and still fights) the methodological war. My generation mostly gets on with doing both qualitative and quantitative research more harmoniously. I feel the same way about the randomista debate. People like me do a little observational work, a little forecasting, a little qualitative work, some randomized trials, and I’m even starting to do some trial-and-error style work with police in Latin America. I don’t think I’m the exception.
- If anything, the surge of randomized trials have paved the way for rigorous trial-and-error. I’ve seen this at my wife’s organization, the International Rescue Committee. Eight years of randomized trials showed their organization and their donors that some of their biggest investments were not making a difference in the lives of poor people. This has built a case for going back to the drawing board on community development or violence prevention, and now they are starting an R&D lab that looks very similar to Hausmann’s vision. They can do this because expanding a research department to manage randomized trials brought in the people, skills, and evidence base to make a case for innovation.
- There are some structural problems in academic research that make this hard. Organizations like Innovations for Poverty Action and the Poverty Action Lab have drawn bright red lines around randomized trials, and most of the time don’t facilitate other kinds of research. But I can see adaptive and rigorous innovation fitting in.
- (Updated) Some people have said “oh but there are too many randomized trials and too much emphasis.” This is the nature of new research technologies. People overdo them at first, since the opportunities are so large. Not so long ago everyone ran cross-country regressions, or wrote a little theoretical game. These are still useful, but they’ve receded as new methods appear. So, this too will pass. Randomized trials will join the pantheon of mediocre methods at our disposal. (The saddest statement is that, to the aid industry, and to much of social science, a randomized trial is “new”. Scientists are aghast at this.)
My view: we can push rigorous trial and error up without pushing other approaches to learning down.