s.matthew.english s.matthew.english - 2 days ago 4
JSON Question

parse json from an api with python, exception handling, surgical extraction

The data under consideration is coming from an api, which means that it's highly inconsistent- sometimes it pulls unexpected content, sometimes it pulls nothing, etc.

What I'm interested in is the data associated with ISO 3166-2 for each record.

The data (when it doesn't encounter an error) generally looks something like this:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "RO", "adminCode1": "10", "countryName": "Romania", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "10"}, {"type": "ISO3166-2", "code": "B"}], "adminName1": "Bucure\u015fti"}
{"countryCode": "DE", "adminCode1": "07", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "07"}, {"type": "ISO3166-2", "code": "NW"}], "adminName1": "North Rhine-Westphalia"}
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
{"countryCode": "DE", "adminCode1": "02", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "02"}, {"type": "ISO3166-2", "code": "BY"}], "adminName1": "Bavaria"}


Let's take one record for example:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}


From this I'm interested to extract the
ISO 3166-2
representation, i.e.
DE-BW
.

I've been trying different ways of extracting this information with python, one attempt looked like this:

coord = response.get('codes', {}).get('type', {}).get('ISO3166-2', None)


another attempt looked like this:

print(json.dumps(response["codes"]["ISO3166-2"]))


However neither of those methods worked.

How can I take a record such as:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}


and extract only
DE-BW
using python, while simultaneously controlling for instances that don't look exactly like that, for instance also extracting
GB-ENG
from:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}


and of course not crashing if it gets something that doesn't look like either of those, i.e. exception handling.




FULL FILE

import json
import requests
from collections import defaultdict
from pprint import pprint

# open up the output of 'data-processing.py'
with open('job-numbers-by-location.txt') as data_file:

for line in data_file:
identifier, name, coords, number_of_jobs = line.split("|")
coords = coords[1:-1]
lat, lng = coords.split(",")
# print("lat: " + lat, "lng: " + lng)
response = requests.get("http://api.geonames.org/countrySubdivisionJSON?lat="+lat+"&lng="+lng+"&username=s.matthew.english").json()


codes = response.get('codes', [])
for code in codes:
if code.get('type') == 'ISO3166-2':
print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN'))

Answer

'ISO3166-2' is dictionary value, not key

codes = response.get('codes', [])
for code in codes:
    if code.get('type') == 'ISO3166-2':
        print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN')))
Comments