ThomasErnste ThomasErnste - 1 month ago 11
Python Question

Can I use a list of numbers as a string formatting as a way of deleting all of the rows in a dataframe containing a series of values?

I am trying to remove all of the rows from a dataframe if the row contains several possible strings that might be

'2 yrs'
'3 yrs'
'4 yrs'
and so on, all the way up as high as '30 yrs.'

To do this cleanly, I'd like to do this in one line. So I'm trying to write the code to refer to all of these numbers at once using string formatting.

If I wanted to remove just the row that contains '12 Yrs', this line works for that:

df_x = df_x[df_x.Col.str.contains('%d Yrs' % 12) == False]


is my dataframe

is my column name


How can I remove all of the rows with all of the possible strings including
'2 yrs'
'3 yrs'
'4 yrs'
, and so on?

Here is my attempt, but it does not work:

year_numbers = range(0,30)
number_of_years = list(year_numbers)
df_x = df_x[df_x.Col.str.contains('%d Yrs' % tuple(number_of_years)) == False]


TypeError: not all arguments converted during string formatting

Answer Source

You can use regular expressions with str.contains:

df_x[~df_x.Col.str.contains(r'\d+ Yrs')]

The \d+ will match any amount of numbers (but requires at least one), so it would also match O Yrs, 1000 Yrs and so on.