Chris Blattman

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website”

That’s the title of a new paper by data guerrillas Dykstra, Dykstra, Sandefur:

Much of the data underlying global poverty and inequality estimates is not in the public domain, but can be accessed in small pieces using the World Bank’s PovcalNet online tool. To overcome these limitations and reproduce this database in a format more useful to researchers, we ran approximately 23 million queries of the World Bank’s web site, accessing only information that was already in the public domain. This web scraping exercise produced 10,000 points on the cumulative distribution of income or consumption from each of 942 surveys spanning 127 countries over the period 1977 to 2012. This short note describes our methodology, briefly discusses some of the relevant intellectual property issues, and illustrates the kind of calculations that are facilitated by this data set, including growth incidence curves and poverty rates using alternative PPP indices.

43 Responses

Comments are closed.

Why We Fight - Book Cover
Subscribe to Blog