Rodolfo Orozco Rodolfo Orozco - 1 year ago 58
Python Question

Iterating over a DataFrame, evaluating column values, and setting value to a third column

I have been trying to iterate through a DataFrame or Apply a function, in order to change the content in a specific column of the DataFrame based on 2 other columns also in the DataFrame.

I have a df like:

df = pd.DataFrame({'Age_type' : pd.Series(['Adult','Adult','Child','Child']),
'Gender' : pd.Series(['Female','Male','Female','Female'])
})

Gender Age_type Group
0 Female Adult
1 Male Adult
2 Female Child
3 Female Child


And I want to set a group for each case, with this idea:

if gender == 'Female' and age_type == 'Adult':
group = 'Group A'
elif gender == 'Female' and age_type == 'Child':
group = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
group = 'Group C'
elif gender == 'Male' and age_type == 'Child':
group = 'Group D'


I have tried to use .apply(function) because as far as I understand, you should never modify a DataFrame while iterating over it (So this would make a for loop not an option ¿?).

I have tried:

def set_group(data):
gender = data['Gender']
age_type = data['Age_type']
if gender == 'Female' and age_type == 'Adult':
data['Group'] = 'Group A'
elif gender == 'Female' and age_type == 'Child':
data['Group'] = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
data['Group'] = 'Group C'
elif gender == 'Male' and age_type == 'Child':
data['Group'] = 'Group D'
return None

df['Group'].apply(set_group)


but I keep getting errors like:
TypeError: string indices must be integers, not str

Any idea on how to iterate over a DataFrame, read some columns, and based on that, set the value for another column?

Thanks!

Answer Source

Try this:

dfx['group'] = ""
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Adult'),'A', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Child'),'B', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Adult'),'C', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Child'),'D', dfx['group'])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download