lostsoul29 lostsoul29 - 1 month ago 8
Python Question

Pandas filter rows using multiple fields together

I have a pandas

DataFrame
like this:

In [34]: people = pandas.DataFrame({'name' : ['John', 'John', 'Mike', 'Sarah', 'Julie'], 'age' : [28, 18, 18, 2, 69]})
people = people[['name', 'age']]
people

Out[34]:
name age
0 John 28
1 John 18
2 Mike 18
3 Sarah 2
4 Julie 69


I want to filter this
DataFrame
using the following tuples:

In [35]: filter = [('John', 28), ('Mike', 18)]


The output should look like this:

Out[35]:
name age
0 John 28
2 Mike 18


I've tried doing this:

In [34]: mask = k.isin({'name': ['John', 'Mike'], 'age': [28, 18]}).all(axis=1)
k = k[mask]
k


However it shows me both Johns because it filters each column independently (the ages of both Johns are present in the
age
array).

Out[34]:
name age
0 John 28
1 John 18
2 Mike 18


How do I filter rows based on multiple fields taken together?

Answer

This should work:

people.set_index(people.columns.tolist(), drop=False).loc[filter].reset_index(drop=True)

Cleaned up and with explanation

# set_index with the columns you want to reference in tuples
cols = ['name', 'age']
people = people.set_index(cols, drop=False)
#                                   ^
#                                   |
#   ensure the cols stay in dataframe

#   does what you
#   want but now has
#   index that was
#   not there
# /--------------\
people.loc[filter].reset_index(drop=True)
#                 \---------------------/
#                  Gets rid of that index