Harelephant Harelephant - 23 days ago 10
Python Question

Iterating through a pandas dataframe and inserting new values into an empty column

I'm relatively new to Pandas and am having trouble iterating through the values in a given column in my dataset and finding those cells which contain a specific string.

Address,City
['1234 Apple Drive', 'San Francisco', 'CA'],''
['4678 Bannana Street', 'Austin', 'TX'],''


For this example, I want to a) extract just the street information of b) addresses which contain the string 'Street' and c) place them in a newly inserted column named 'Street.'

Address,City,Street
['1234 Apple Drive', 'San Francisco', 'CA'],'',''
['4678 Bannana Street', 'Austin', 'TX'],'','4678 Bannana Street'


I know how to insert a new column into my dataset. My code so far looks like this (assume my current dataset has only two columns and the entries from the example):

import numpy as np
import pandas as pd
from pandas import DataFrame, read_csv

df = pd.read_csv('dataset.csv', sep = '\t')
df.insert(loc=3, column = 'street', value=str)


The rest of what I have isn't pretty and has been useless so far. Any help with executing a, b and c is much appreciated! Thanks.

Answer

Try this:

import re

df = pd.DataFrame([['1234 Apple Drive', 'San Francisco', 'CA'],
                   ['4678 Bannana Street', 'Austin', 'TX']],
                  columns=['Address', 'City', 'State'])

df['Street'] = df.Address.str.extract(r'([\S]+)\s+Street', flags=re.IGNORECASE)
print df

               Address           City State   Street
0     1234 Apple Drive  San Francisco    CA      NaN
1  4678 Bannana Street         Austin    TX  Bannana

If you wanted to include street number as well:

import re

df[['Street Number', 'Street']] = df.Address.str.extract(r'(\S+)\s+(\S+)\s+Street', expand=True, flags=re.IGNORECASE)
print df

               Address           City State Street Number   Street
0     1234 Apple Drive  San Francisco    CA           NaN      NaN
1  4678 Bannana Street         Austin    TX          4678  Bannana

‚Äč

Note

using pandas 0.18.1 I get a future warning if I don't specify the expand flag. Use this instead.

df['Street'] = df.Address.str.extract(r'([\S]+)\s+Street', expand=False, flags=re.IGNORECASE)
Comments