user5368737 user5368737 - 11 months ago 102
Python Question

Filtering pandas DataFrame

I'm reading in a .csv file using pandas, and then I want to filter out the rows where a specified column's value is not in a dictionary for example. So something like this:

df = pd.read_csv('mycsv.csv', sep='\t', encoding='utf-8', index_col=0,
names=['col1', 'col2','col3','col4'])

c = df.col4.value_counts(normalize=True).head(20)
values = dict(zip(c.index.tolist()[1::2], c.tolist()[1::2])) # Get odd and create dict

df_filtered = filter out all rows where col4 not in values

After searching around a bit I tried using the following to filter it:

df_filtered = df[df.col4 in values]

but that unfortunately didn't work.

I've done the following to make it works for what I want to do, but it's incredibly slow for a large .csv file, so I thought there must be a way to do it that's built in to pandas:

t = [(list(df.col1) + list(df.col2) + list(df.col3)) for i in range(len(df.col4)) if list(df.col4)[i] in values]

Answer Source

If you want to check against the dictionary values:

df_filtered = df[df.col4.isin(values.values())]

If you want to check against the dictionary keys:

df_filtered = df[df.col4.isin(values.keys())]