I would like to automate the download of CSV files from the World Bank's dataset.
My problem is that the URL corresponding to a specific dataset does not lead directly to the desired CSV file but is instead a query to the World Bank's API. As an example, this is the URL to get the GDP per capita data: http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv.
If you paste this URL in your browser, it will automatically start the download of the corresponding file. As a consequence, the code I usually use to collect and save CSV files in Python is not working in the present situation:
baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen("%s" %(baseUrl))
myData = csv.reader(remoteCSV)
This will get the zip downloaded, open it and get you a csv object with whatever file you want.
import urllib2 import StringIO from zipfile import ZipFile import csv baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv" remoteCSV = urllib2.urlopen(baseUrl) sio = StringIO.StringIO() sio.write(remoteCSV.read()) # We create a StringIO object so that we can work on the results of the request (a string) as though it is a file. z = ZipFile(sio, 'r') # We now create a ZipFile object pointed to by 'z' and we can do a few things here: print z.namelist() # A list with the names of all the files in the zip you just downloaded # We can use z.namelist() to refer to 'ny.gdp.pcap.cd_Indicator_en_csv_v2.csv' with z.open(z.namelist()) as f: # Opens the 2nd file in the zip csvr = csv.reader(f) for row in csvr: print row