vdBurg vdBurg - 3 months ago 10
Python Question

Retrieve geocodes from Google API and append to original table - python

I am trying to retrieve the geocodes of a bunch of addresses through the Google geocoding API and append them to my table with addresses.

After spending two days reviewing the internet I coulnd´t find any simple way of doing while it shouldn´t be that hard. I am specially having problems parsing the json output and append it to my original table.
I use python 3.5 on windows

I originally got the data from a database which I added to a dataframe in python. But to paste it here it was easier to convert it to a dictionary and back to a dataframe:

data_dict={'street': {0: 'ROMULO', 1: 'SAN BARTOLOME', 2: 'GARBI', 3: 'SAN JOSE'},
'concat': {0: '3+ROMULO+CALLE+ALMERIA',
1: '5+SAN BARTOLOME+CALLE+TOLEDO',
2: '48+GARBI+CALLE+CASTELLON',
3: '30+SAN JOSE+CALLE+SANTA CRUZ DE TENERIFE'},
'number': {0: '3', 1: '5', 2: '48', 3: '30'},
'province': {0: 'ALMERIA',
1: 'TOLEDO',
2: 'CASTELLON',
3: 'SANTA CRUZ DE TENERIFE'},
'region': {0: 'ANDALUCIA',
1: 'CASTILLA LA MANCHA',
2: 'COMUNIDAD VALENCIANA',
3: 'CANARIAS'}}


Back to dataframe:

import pandas as pd

table=pd.DataFrame.from_dict(data_dict)


Now I retrieve the data from the Google geocoding API:

import requests
import json

key="MyKey"
jsonout=[]
for i in table.loc[:,'concat']:
try:
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s&key=%s" % (i, key)
response = requests.get(url)
jsonf = response.json()
jsonout.append(jsonf)
except Exception:
continue


I get this output:

jsonout=[{'results': [{'address_components': [{'long_name': '3',
'short_name': '3',
'types': ['street_number']},
{'long_name': 'Calle Rómulo',
'short_name': 'Calle Rómulo',
'types': ['route']},
{'long_name': 'Adra',
'short_name': 'Adra',
'types': ['locality', 'political']},
{'long_name': 'Almería',
'short_name': 'AL',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Andalucía',
'short_name': 'AL',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '04770', 'short_name': '04770', 'types': ['postal_code']}],
'formatted_address': 'Calle Rómulo, 3, 04770 Adra, Almería, Spain',
'geometry': {'location': {'lat': 36.7593, 'lng': -2.97818},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 36.76064898029149,
'lng': -2.976831019708498},
'southwest': {'lat': 36.7579510197085, 'lng': -2.979528980291502}}},
'partial_match': True,
'place_id': 'ChIJG39VNzNOcA0R2f8Ek3E12AY',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '5',
'short_name': '5',
'types': ['street_number']},
{'long_name': 'Calle de San Bartolomé',
'short_name': 'Calle de San Bartolomé',
'types': ['route']},
{'long_name': 'Toledo',
'short_name': 'Toledo',
'types': ['locality', 'political']},
{'long_name': 'Toledo',
'short_name': 'TO',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Castilla-La Mancha',
'short_name': 'CM',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '45002', 'short_name': '45002', 'types': ['postal_code']}],
'formatted_address': 'Calle de San Bartolomé, 5, 45002 Toledo, Spain',
'geometry': {'location': {'lat': 39.8549781, 'lng': -4.026267199999999},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 39.85632708029149,
'lng': -4.024918219708497},
'southwest': {'lat': 39.85362911970849, 'lng': -4.027616180291502}}},
'partial_match': True,
'place_id': 'ChIJ4bse1aALag0RJ5RxxfyDxUI',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '48',
'short_name': '48',
'types': ['street_number']},
{'long_name': 'Carrer de Garbí',
'short_name': 'Carrer de Garbí',
'types': ['route']},
{'long_name': 'Peníscola',
'short_name': 'Peníscola',
'types': ['locality', 'political']},
{'long_name': 'Castelló',
'short_name': 'Castelló',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Comunidad Valenciana',
'short_name': 'Comunidad Valenciana',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '12598', 'short_name': '12598', 'types': ['postal_code']}],
'formatted_address': 'Carrer de Garbí, 48, 12598 Peníscola, Castelló, Spain',
'geometry': {'location': {'lat': 40.3634529, 'lng': 0.3963583},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 40.3648018802915,
'lng': 0.397707280291502},
'southwest': {'lat': 40.3621039197085, 'lng': 0.395009319708498}}},
'partial_match': True,
'place_id': 'ChIJHVNHcelGoBIRogILRMno_wk',
'types': ['street_address']},
{'address_components': [{'long_name': '48',
'short_name': '48',
'types': ['street_number']},
{'long_name': 'Carrer Garbí',
'short_name': 'Carrer Garbí',
'types': ['route']},
{'long_name': 'Vila-real',
'short_name': 'Vila-real',
'types': ['locality', 'political']},
{'long_name': 'Castelló',
'short_name': 'Castelló',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Comunidad Valenciana',
'short_name': 'Comunidad Valenciana',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '12540', 'short_name': '12540', 'types': ['postal_code']}],
'formatted_address': 'Carrer Garbí, 48, 12540 Vila-real, Castelló, Spain',
'geometry': {'bounds': {'northeast': {'lat': 39.955829, 'lng': -0.110409},
'southwest': {'lat': 39.9558231, 'lng': -0.1104261}},
'location': {'lat': 39.9558231, 'lng': -0.110409},
'location_type': 'RANGE_INTERPOLATED',
'viewport': {'northeast': {'lat': 39.9571750302915,
'lng': -0.109068569708498},
'southwest': {'lat': 39.9544770697085, 'lng': -0.111766530291502}}},
'partial_match': True,
'place_id': 'EjRDYXJyZXIgR2FyYsOtLCA0OCwgMTI1NDAgVmlsYS1yZWFsLCBDYXN0ZWxsw7MsIFNwYWlu',
'types': ['street_address']}],
'status': 'OK'},
{'results': [{'address_components': [{'long_name': '30',
'short_name': '30',
'types': ['street_number']},
{'long_name': 'Calle San José',
'short_name': 'Calle San José',
'types': ['route']},
{'long_name': 'Santa Cruz de la Palma',
'short_name': 'Santa Cruz de la Palma',
'types': ['locality', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'TF',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Canarias',
'short_name': 'CN',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '38700', 'short_name': '38700', 'types': ['postal_code']}],
'formatted_address': 'Calle San José, 30, 38700 Santa Cruz de la Palma, Santa Cruz de Tenerife, Spain',
'geometry': {'location': {'lat': 28.6864347, 'lng': -17.7624433},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 28.6877836802915,
'lng': -17.7610943197085},
'southwest': {'lat': 28.6850857197085, 'lng': -17.7637922802915}}},
'partial_match': True,
'place_id': 'ChIJ8ZFx6__rawwRV3dc118gEgE',
'types': ['street_address']},
{'address_components': [{'long_name': '30',
'short_name': '30',
'types': ['street_number']},
{'long_name': 'Calle San José',
'short_name': 'Calle San José',
'types': ['route']},
{'long_name': 'San Andrés',
'short_name': 'San Andrés',
'types': ['locality', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'Santa Cruz de Tenerife',
'types': ['administrative_area_level_4', 'political']},
{'long_name': 'Santa Cruz de Tenerife',
'short_name': 'TF',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Canarias',
'short_name': 'CN',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'Spain',
'short_name': 'ES',
'types': ['country', 'political']},
{'long_name': '38120', 'short_name': '38120', 'types': ['postal_code']}],
'formatted_address': 'Calle San José, 30, 38120 San Andrés, Santa Cruz de Tenerife, Spain',
'geometry': {'location': {'lat': 28.505875, 'lng': -16.1930036},
'location_type': 'ROOFTOP',
'viewport': {'northeast': {'lat': 28.5072239802915,
'lng': -16.1916546197085},
'southwest': {'lat': 28.5045260197085, 'lng': -16.1943525802915}}},
'partial_match': True,
'place_id': 'ChIJsfd-ITjKQQwRjFHLI0XPSok',
'types': ['street_address']}],
'status': 'OK'}]


What I finally would like to have is my original table dataframe with the lat and lng coordinates

(i['results'][0]['geometry']['location']['lat'],
i['results'][0]['geometry']['location']['lng'])


and the formatted_address from the request.

Answer

I use this package to do my geocoding, which takes care of parsing the JSON file.

from geopy.geocoders import GoogleV3

googleGeo = GoogleV3('googleKey')

# create a geocoded list containing geocode objects
geocoded = []
for address in mydata['location']:  # assumes mydata is a pandas df
    geocoded.append(googleGeo.geocode(address))  # geocode function returns a geocoded object

# append geocoded list to mydata
mydata['geocoded'] = geocoded

# create coordinates column
mydata['coords'] = mydata['geocoded'].apply(lambda x: (x.latitude, x.longitude))

# if you want to split our your lat and long then do
# mydata['lat'] = mydata['geocoded'].apply(lambda x: x.latitude)
# mydata['long'] = mydata['geocoded'].apply(lambda x: x.longitude)

Based on the comment you shared, if you are using Google's API without an API key, then it might be beneficial to include a random pause between each geocode call.

from time import sleep
from random import randint
from geopy.geocoders import GoogleV3

googleGeo = GoogleV3()

def geocode(address):
    location = googleGeo.geocode(address)
    sleep(randint(5,10))  # give the API a break
    return location

Then you use this custom function to do your geocoding


Piggybacking on my earlier section, you can even utilize multiple map API services. This is the function I built for one of my projects, utilizing Nominatim's API first, and then falling back on Google's API if Nominatim either returns an error or returns nothing:

from geopy.geocoders import Nominatim, GoogleV3
from geopy.exc import GeocoderTimedOut, GeocoderAuthenticationFailure
from random import randint
from time import sleep

nomiGeo = Nominatim()  # Nominatim geolocator
googleGeo = GoogleV3('myKey')  # Google Maps v3 API geolocator

def geocode(address):
    """Geocode an address.

    Args:
        address (str): the physical address

    Returns:
        dict: geocoded object
    """
    location = None
    attempt = 0
    useGoogle = False  # set to True to use Google only
    while (location is None) and (attempt <= 8):
        try:
            attempt += 1
            if useGoogle:
                location = googleGeo.geocode(address, timeout=10)
            else:
                location = nomiGeo.geocode(address, timeout=10)
                if location is None:
                    useGoogle = True
                    location = googleGeo.geocode(address, timeout=10)
            sleep(randint(5, 10))  # Give the API a break
        except GeocoderAuthenticationFailure:
            print 'Error: GeocoderAuthenticationFailure while geocoding {} during attempt #{}'.format(address, attempt)
            if attempt % 2 == 0:  # switch between services for every attempt
                useGoogle = True
            else:
                useGoogle = False
                sleep(60)
        except GeocoderTimedOut:
            sleep(randint(3, 5))  # Give API a break
            print 'Error: GeocoderTimedOut while geocoding {} during attempt #{}'.format(address, attempt)
    return location

Note that I also imported some exceptions specific to the package, because based on my experience with Nominatim, it can sometimes throw random errors and these were the two that I got. Also, from my experiences with the two APIs, Google seemed to be able to interpolate coordinates even if a certain address was not found, whereas Nominatim had to have the address in their database in order to return something.

Comments