I have a data frame with a wage column that specifies an hourly wage and a union column that specifies whether or not an employee is in a union. There are other variables too, but they don't matter right now. I'm trying to find the average wage for employees in a union. I've written the code that provides a True/False list of whether or not an employee is in a union. However, I don't know how to apply that list in order to get an average wage. Thanks in advance for any help.
#Read cps.csv file
import pandas as pd
cps_df = pd.read_csv('cps.csv')
#Function to determine whether or not an employee is in a union
""" return true if union else false """
if x['union'] == 'Union':
#Function to create a list of union vs non-union
""" return a list determining union vs non-union """
return [hourly_wage(x) for index, x in y.iterrows()]
I suppose you can do it in a more convenient way. Pandas is great for such things.
Let's assume that wage column is 'wage'. Then code will look as following:
import pandas as pd cps_df = pd.read_csv('cps.csv') print('Union workers mean wage: ', cps_df[cps_df.union == 'Union'].wage.mean())
What it essentially does: 1. Selects all union workers (cps_df[cps_df.union == 'Union']) 2. Aplies mean() function to their wages (.wage.mean())
Hope this helps.