user116873 user116873 - 3 months ago 14
Python Question

Pandas: multiple condition to strings

I try to change my dataframe.
Usually I use something like

df1= df[df.url.str.contains("avito.ru/*/telefony/")]


But if I want a lot of condition?
I want to write to
contains
more than 100 strings.
How can I do that?

Dataframe

анонс кинофильмов 2016
анонс кинофильмов 2016
"выборок имеют величину момента сопротивления"
"выборок имеют величину момента сопротивления"
ансамбль 9 человек
ансамбль 9 человек
ансамбль 9 человек
"Времена года в музыке, литературе, живописи"
"Времена года в музыке, литературе, живописи"
"Времена года в музыке, литературе, живописи"
apple iphone
samsumg
facebook
None
None
None


And some words from list

lst = ['iphone', 'sony', 'alcatel', 'galaxy', 'samsumg]


Desire output

apple iphone
samsumg
None
None
None


I mean if some words don't contain in str, I want to delete that. (But values with None I want to have there too).

Answer

You can create a pattern by joining | with all your list items and pass this to str.contains:

In [31]:
lst = ['iphone', 'sony', 'alcatel', 'galaxy', 'samsumg','None']
pat = '|'.join(lst)
df[df['url'].str.contains(pat)]

Out[31]:
             url
10  apple iphone
11       samsumg
13          None
14          None
15          None

To handle the missing values include pd.isNull(df['url']) in the boolean condition:

In [54]:
lst = ['iphone', 'sony', 'alcatel', 'galaxy', 'samsumg']
pat = '|'.join(lst)
df[pd.isnull(df['url']) | df['url'].str.contains(pat) ]

Out[54]:
             url
10  apple iphone
11       samsumg
13           NaN
14           NaN
15           NaN