zyc zyc - 13 days ago 7
Python Question

Elegant way to delete items in a list which do not has substrings that appear in another list

Recently I encountered this problem:

Say there are a list of something I want to process:

process_list=["/test/fruit/apple","/test/fruit/pineapple","/test/fruit/banana","/test/tech/apple-pen","/test/animal/python","/test/animal/penguin"]


And I want to exclude something using another list, for instance:

exclude_list=["apple","python"]


The process_list should be like this after I apply the exclude_list to it( any process_list item that contains a sub:

["/test/fruit/banana","/test/animal/penguin]


or if the exclude_list is:
exclude_list=["pen","banana"]


The process_list should be this after apply the filter:

["/test/fruit/apple","/test/fruit/pineapple","/test/animal/python"]


So what I was trying at first was:

for item in exclude_list:
for name in (process_list):
if item in name:
process_list.remove(name)


Of course this didn't work because removing elements from the list while iterating over it using a
for
loop is not permitted. The code only removed the first match and then stopped.

So then I came up a way to do this with another list:

deletion_list=[] #Track names that need to be deleted
for item in exclude_list:
for name in (process_list):
if item in name:
deletion_list.append(name)
# A list comprehension
process_list=[ x for x in process_list if x not in deletion_list ]


It works, but my guts tell me there may be a more elegant way. Now it need s another list to store the name need to be deleted. Any ideas?

Answer

You may use the list comprehension expression using any() filter as:

# Here: `p` is the entry from `process_list`
#       `e` is the entry from `exclude_list`

>>> [p for p in process_list if all(e not in p for e in exclude_list)]                              
['/test/fruit/banana', '/test/animal/penguin']

Regarding your statement:

Of course this didn't work because removing elements from the list while iterating over it using a for loop is not permitted. The code only removed the first match and then stopped.

You could have iterate over the copy of the list as:

 for item in list(exclude_list):  # OR, for item in exclude_list[:]:
 #              ^-- Creates new copy ----------------------------^