user1917407 user1917407 - 3 months ago 9
Python Question

What is the correct way to use .apply with pandas?

I'm working with a million-row CSV dataset that includes columns "latitude" and "longitude", and I want to create a new column based on that called "state", which is the US state that contains those coordinates.

import pandas as pd
import numpy as np
import os
from uszipcode import ZipcodeSearchEngine

def convert_to_state(coord):
lat, lon = coord["latitude"], coord["longitude"]
res = search.by_coordinate(lat, lon, radius=1, returns=1)
state = res.State
return state

def get_state(path):
with open(path + "USA_downloads.csv", 'r+') as f:
data = pd.read_csv(f)
data["state"] = data.loc[:, ["latitude", "longitude"]].apply(convert_to_state, axis=1)

get_state(path)


I keep getting an error "DtypeWarning: Columns (4,5) have mixed types. Specify dtype option on import or set low_memory=False." Columns 4 and 5 correspond to the latitude and longitude. I don't understand how I would use .apply to complete this task, or if .apply is even the right method for the job. How should I proceed?

Answer

I believe this will be a faster implementation of your program:

import pandas as pd
import numpy as np
import os
from uszipcode import ZipcodeSearchEngine

def convert_to_state(lat, lon):
    res = search.by_coordinate(lat, lon, radius=1, returns=1)
    state = res.State
    return state

def get_state(path):
    with open(path + "USA_downloads.csv", 'r+') as f:
        data = pd.read_csv(f)
        data["state"] = np.vectorize(convert_to_state)(data["latitude"].values, data["longitude"].values)

get_state(path)

It uses numpy.vectorize to speed things up a little (although it is still a loop), and then calls the function with the values obtained from the 'latitude' and 'longitude' columns of your DataFrame, converted to numpy.ndarray (the .values attribute does that).


If you want to keep using .apply(), you can do:

state = data.apply(lambda x: convert_to_state(x['latitude'], x['longitude']), axis=1)
data["state"] = state
Comments