Werner Schoemaker Werner Schoemaker - 2 years ago 120
Python Question

Python Check if list item does (not) contain any of other list items

I have this problem where I want to remove a list element if it contains 'illegal' characters. The legal characters are specified in multiple lists. They are formed like this, where

alpha
stands for the alphabet (a-z + A-Z),
digit
stands for digits (0-9) and
punct
stands for punctuation (sort of).

alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)


This way I can specify something as an illegal character if it doesn't appear in one of these lists.

After that I have a list containing elements:

Input = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]


I want to filter out the elements containing illegal characters. So this is the result I want to get (doesn't need to be ordered):

var = ["Amuu2", "Q1BFt", "mgF)`", "Y9^^M", "W0PD7"]


EDIT:

I have tried (and all variants of it):

for InItem in Input:
if any(AlItem in InItem for AlItem in alpha+digit+punct):
FilInput.append(InItem)


where a new list is created with only the filtered elements, but the problem here is that the elements get added when the contain at least one legal character. For example:
"ZR°p"
got added, because it contains a Z, R and a p.

I also tried:

for InItem in Input:
if not any(AlItem in InItem for AlItem in alpha+digit+punct):


but after that, I couldn't figure out how to remove the element.
Oh, and a little tip, to make it extra difficult, it would be nice if it were a little bit fast because it needs to be done millions of times. But it needs to be working first.

Answer Source

Your code

As you mentioned, you append words as soon as any character is a correct one. You need to check that they are all correct:

filtered_words = []
for word in words:
    if all(char in alpha+digit+punct for char in word):
        filtered_words.append(word)

print(filtered_words)
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

You could also check that there's not a single character which isn't correct:

filtered_words = []
for word in words:
    if not any(char not in alpha+digit+punct for char in word):
        filtered_words.append(word)

print(filtered_words)

It's much less readable though.

For efficiency, you shouldn't concatenate lists during each iteration with alpha+digit+punct. You should do it once and for all, before any loop. It's also a good idea to create a set out of those lists, because char in set is much faster than char in list when there are many allowed characters.

Finally, you could use a list comprehension to avoid the for loop. If you do all this, you end up with @timgeb's solution :)

Alternative with regex

You can create a regex pattern from your lists and see which words match:

# encoding: utf-8
import string
import re

alpha = list(string.ascii_letters)
digit = list(string.digits)
punct = list(string.punctuation)

words = ["Amuu2", "Q1BFt", "dUM€n", "o°8o1G", "mgF)`", "ZR°p", "Y9^^M", "W0PD7"]

allowed_pattern = re.compile(
    '^[' +
    ''.join(
        re.escape(char) for char in (
            alpha +
            digit +
            punct)) +
    ']+$')
# ^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]+$

print([word for word in words if allowed_pattern.match(word)])
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

You could also write:

print(list(filter(allowed_pattern.match, words)))
# ['Amuu2', 'Q1BFt', 'mgF)`', 'Y9^^M', 'W0PD7']

re.compile will probably require more time than simply initializing a set but the filtering might be faster then.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download