"We Just Ran Twenty-Three Million Queries of the World Bank's Website"

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website”

July 3, 2014

That’s the title of a new paper by data guerrillas Dykstra, Dykstra, Sandefur:

Much of the data underlying global poverty and inequality estimates is not in the public domain, but can be accessed in small pieces using the World Bank’s PovcalNet online tool. To overcome these limitations and reproduce this database in a format more useful to researchers, we ran approximately 23 million queries of the World Bank’s web site, accessing only information that was already in the public domain. This web scraping exercise produced 10,000 points on the cumulative distribution of income or consumption from each of 942 surveys spanning 127 countries over the period 1977 to 2012. This short note describes our methodology, briefly discusses some of the relevant intellectual property issues, and illustrates the kind of calculations that are facilitated by this data set, including growth incidence curves and poverty rates using alternative PPP indices.

43 Responses

simonweschle says:

July 5, 2014 at 3:30 pm

Great resource. RT @cblatts Data guerrillas: http://t.co/NTVj7GT30P
rossiarossi says:

July 5, 2014 at 2:21 am

DataGuerrillero The revolution [of data]’s not an apple that falls when it is ripe. You have to make it fall @cblatts http://t.co/BS4dvDpT1c
eksi_iktisat says:

July 4, 2014 at 2:28 pm

RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
nerdbound says:

July 4, 2014 at 5:07 am

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/SLtfaufBkl
leafjohnson says:

July 3, 2014 at 11:16 pm

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/G7fXT4Gj9l via @jetpack
diRadiTic says:

July 3, 2014 at 10:26 pm

interesting →”@cblatts: How to crash the World Bank website in the name of social science. http://t.co/4rqOpvatk6“
akshilling says:

July 3, 2014 at 9:15 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
tommyDre says:

July 3, 2014 at 4:42 pm

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/4XPpFTSGJc via @jetpack
GuardedFear971 says:

July 3, 2014 at 3:00 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
Integrilicious says:

July 3, 2014 at 1:57 pm

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/DDBQxzen6W
pcdnetwork says:

July 3, 2014 at 1:45 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
anitayorker says:

July 3, 2014 at 1:32 pm

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” – Chris … http://t.co/BY9sIiacKR, see more http://t.co/M4stgKjNoH
LizCarolan says:

July 3, 2014 at 1:23 pm

Another @cblatts piece – Data Guerrillas (cc @UKODITech @Floppy @statshero) http://t.co/31Tj3sKYUg
andre_quentin says:

July 3, 2014 at 1:15 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
MuleyaM says:

July 3, 2014 at 1:13 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
foolsbanquet says:

July 3, 2014 at 12:59 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
Blogageco says:

July 3, 2014 at 12:55 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
Adebola_Balogun says:

July 3, 2014 at 12:50 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
betsylevyp says:

July 3, 2014 at 12:47 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
andyguess says:

July 3, 2014 at 12:46 pm

RT @cblatts: How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
cblatts says:

July 3, 2014 at 12:45 pm

How to crash the World Bank website in the name of social science. http://t.co/JIxOgpoUjs
filipspagnoli says:

July 3, 2014 at 12:44 pm

@cblatts Here’s a related thing: guerrilla polling: http://t.co/YUD1KjRToQ Obviously more dangerous though.
magarya says:

July 3, 2014 at 12:30 pm

RT @CGDev: MT @cblatts: “We Just Ran Twenty-Three Million Queries of the @WorldBank’s Website” @JustinSandefur http://t.co/7G8K5m3kV2
ComplaymentdO says:

July 3, 2014 at 12:06 pm

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website”: That’s the title of a new paper by dat… http://t.co/MCAKkQwqPI
spgolden says:

July 3, 2014 at 11:55 am

RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
nabilhashmi says:

July 3, 2014 at 11:52 am

RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
JeffBloem says:

July 3, 2014 at 11:47 am

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/6WrEmRHKjl
l_peer says:

July 3, 2014 at 11:46 am

@cblatts Love the recs: publish the underlying code, embrace open data formats, release sufficient data so others can recreate estimates
mikemadowitz says:

July 3, 2014 at 11:43 am

Nice Sala-i-Martin reference MT @CGDev: MT @cblatts “We Just Ran 23 Million Queries of WB Website” @JustinSandefur http://t.co/RFLxY0bfet
CGDev says:

July 3, 2014 at 11:30 am

MT @cblatts: “We Just Ran Twenty-Three Million Queries of the @WorldBank’s Website” @JustinSandefur http://t.co/7G8K5m3kV2
LarkaDG says:

July 3, 2014 at 11:26 am

RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
jamesfeigenbaum says:

July 3, 2014 at 11:26 am

RT @cblatts: Data guerrillas: http://t.co/vfWUsTs03R
ChazDazzle says:

July 3, 2014 at 11:23 am

@cblatts they should be careful. Aaron Swartz was indicted for less
cblatts says:

July 3, 2014 at 11:22 am

Data guerrillas: http://t.co/vfWUsTs03R
mzabek says:

July 3, 2014 at 11:12 am

RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
PCfromDC says:

July 3, 2014 at 11:08 am

“@MJSwart: RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/Y7JGAS5Src cc @iOnline247
nnunn99 says:

July 3, 2014 at 10:56 am

RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
kedarmankad says:

July 3, 2014 at 10:51 am

RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
datachick says:

July 3, 2014 at 10:44 am

RT @MJSwart: RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/00jMmjnsci <== Now that’s a co…
MilanV says:

July 3, 2014 at 10:42 am

RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/HxzopCj3fs
AdannaChukwuma says:

July 3, 2014 at 10:36 am

“We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/YXOfQEhHBP via @feedly
MJSwart says:

July 3, 2014 at 10:24 am

@cblatts BTW “data guerrillas” is excellent.
MJSwart says:

July 3, 2014 at 10:22 am

RT @cblatts: “We Just Ran Twenty-Three Million Queries of the World Bank’s Website” http://t.co/00jMmjnsci <== Now that’s a complicated ETL