ahmedawaji ahmedawaji - 2 months ago 19
Python Question

Dataframe filtering in pandas

How can I filter or subset a particular group within a dataframe (e.g., admitted female from the dataframe below)?
I am trying to sum up admissions/rejection rates based on gender. This dataframe is small, but what if it was much larger, let's say for example tens of thousands of line, where indexing individual values is impossible?

Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317

Answer

To filter the data you can use the very comprehensive queryfunction.

# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
        'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
        'Freq': [512, 313, 89, 19, 353, 207, 17],
        'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})

df.query('Admit == "Admitted" and Gender == "Female"')

      Admit  Freq  Gender Gender Dept
2  Admitted    89  Female           A
6  Admitted    17  Female           B

To summarize data use groupby.

group = df.groupby(['Admit', 'Gender']).sum()
print(group)

                 Freq
Admit    Gender      
Admitted Female   106
         Male     865
Rejected Female    19
         Male     520

You can the filter the result simply by subsetting on the created MultiIndex.

group.loc[('Admitted', 'Female')]

Freq    106
Name: (Admitted, Female), dtype: int64