Christopher Flach Christopher Flach - 27 days ago 7
Python Question

Python Pandas Dataframe Control Flow

I have a data frame with a wage column that specifies an hourly wage and a union column that specifies whether or not an employee is in a union. There are other variables too, but they don't matter right now. I'm trying to find the average wage for employees in a union. I've written the code that provides a True/False list of whether or not an employee is in a union. However, I don't know how to apply that list in order to get an average wage. Thanks in advance for any help.

#Read cps.csv file
import pandas as pd
cps_df = pd.read_csv('cps.csv')
cps_df

#Function to determine whether or not an employee is in a union
def hourly_wage(x):
""" return true if union else false """

if x['union'] == 'Union':
return True
else:
return False

#Function to create a list of union vs non-union
def union_list(y):
""" return a list determining union vs non-union """

return [hourly_wage(x) for index, x in y.iterrows()]

#Print list
%time
print(union_list(cps_df))

Answer

I suppose you can do it in a more convenient way. Pandas is great for such things.

Let's assume that wage column is 'wage'. Then code will look as following:

import pandas as pd
cps_df = pd.read_csv('cps.csv')
print('Union workers mean wage: ', cps_df[cps_df.union == 'Union'].wage.mean())

What it essentially does: 1. Selects all union workers (cps_df[cps_df.union == 'Union']) 2. Aplies mean() function to their wages (.wage.mean())

Hope this helps.