That’s the title of a new paper by data guerrillas Dykstra, Dykstra, Sandefur:
Much of the data underlying global poverty and inequality estimates is not in the public domain, but can be accessed in small pieces using the World Bank’s PovcalNet online tool. To overcome these limitations and reproduce this database in a format more useful to researchers, we ran approximately 23 million queries of the World Bank’s web site, accessing only information that was already in the public domain. This web scraping exercise produced 10,000 points on the cumulative distribution of income or consumption from each of 942 surveys spanning 127 countries over the period 1977 to 2012. This short note describes our methodology, briefly discusses some of the relevant intellectual property issues, and illustrates the kind of calculations that are facilitated by this data set, including growth incidence curves and poverty rates using alternative PPP indices.
43 Responses
Great resource. RT @cblatts Data guerrillas: http://t.co/NTVj7GT30P
DataGuerrillero The revolution [of data]’s not an apple that falls when it is ripe. You have to make it fall @cblatts http://t.co/BS4dvDpT1c
RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/SLtfaufBkl
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/G7fXT4Gj9l via @jetpack
interesting →”@cblatts: How to crash the World Bank website in the name of social science. http://t.co/4rqOpvatk6“
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/4XPpFTSGJc via @jetpack
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/DDBQxzen6W
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” – Chris … http://t.co/BY9sIiacKR, see more http://t.co/M4stgKjNoH
Another @cblatts piece – Data Guerrillas (cc @UKODITech @Floppy @statshero) http://t.co/31Tj3sKYUg
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
@cblatts Here’s a related thing: guerrilla polling: http://t.co/YUD1KjRToQ Obviously more dangerous though.
RT @CGDev: MT @cblatts: “We Just Ran Twenty-Three Million Queries of the @WorldBank’s Website” @JustinSandefur http://t.co/7G8K5m3kV2
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website”: That’s the title of a new paper by dat… http://t.co/MCAKkQwqPI
RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/6WrEmRHKjl
@cblatts Love the recs: publish the underlying code, embrace open data formats, release sufficient data so others can recreate estimates
Nice Sala-i-Martin reference MT @CGDev: MT @cblatts “We Just Ran 23 Million Queries of WB Website” @JustinSandefur http://t.co/RFLxY0bfet
MT @cblatts: “We Just Ran Twenty-Three Million Queries of the @WorldBank’s Website” @JustinSandefur http://t.co/7G8K5m3kV2
RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
@cblatts they should be careful. Aaron Swartz was indicted for less
Data guerrillas: http://t.co/vfWUsTs03R
RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
“@MJSwart: RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/Y7JGAS5Src cc @iOnline247
RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
RT @MJSwart: RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/00jMmjnsci <== Now that’s a co…
RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/YXOfQEhHBP via @feedly
@cblatts BTW “data guerrillas” is excellent.
RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/00jMmjnsci <== Now that’s a complicated ETL