user308827 user308827 - 1 month ago 8
Python Question

Writing python function to extract matching rows from pandas dataframe

df1 = pd.DataFrame({'A' : [5,5,5,5], 'B' : [4,2,1, 1], 'C' : [2,2,7,1]})


I want to get those rows in df1 based on foll. condition:

df1.loc[(df1['A'] == 5) & (df1['B'] == 4) & (df1['C'] == 2)]


How can I make it more generic i.e. I want to have a function, where I specify both the column names and the values I am looking for as arguments.

Answer

One option would be to use query. For the conditions in your question, this would involve constructing a string along the lines of 'A==5 & B==4 & C==2'.

To setup the problem, I'm going to assume you provide a list of tuples, in the form of (column, comparison, value) as your conditions, for example ('A', '==', 5).

Then you could write a function along the lines of:

def extract_matching_rows(df, conditions):
    conditions = ' & '.join(['{}{}{}'.format(*c) for c in conditions])
    return df.query(conditions)

If you only care about equality comparisons, you could just hard code in the '==' and eliminate it from your condition tuples.

Example usage with slightly different conditions:

conditions = [('A', '>=', 5), ('B', '==', 4), ('C', '<', 3)]
extract_matching_rows(df1, conditions)

   A  B  C
0  5  4  2

Note that you can even compare columns with query:

conditions = [('B', '>=', 'C'), ('A', '==', 5)]
extract_matching_rows(df1, conditions)

   A  B  C
0  5  4  2
1  5  2  2
3  5  1  1