mgruber mgruber - 8 months ago 38
Python Question

Splitting strings of a list on a separator if it appears

I fetched HTML code from a webpage (within a project of
The fetching resulted in a text. Which I splitted into a list.

The problem: Some results contain Unicode characters, that I want to cut from the strings they appear.

['Normal String', 'Company\xc2\xae', 'againnormal', '\xc2\xb7']

The result should look like this:

['Normal String', 'Company', 'againnormal', '']

OR ideally like this

['Normal String', 'Company', 'againnormal']


How about

>>> stuff = ['Normal String', 'Company\xc2\xae', 'againnormal', '\xc2\xb7']
>>> filter(None, [x.decode('utf8').encode('ascii', 'ignore') for x in stuff])
['Normal String', 'Company', 'againnormal']

alternatively with a regex

>>> import re
>>> filter(None, [re.sub(r'[^\x00-\x7F]+', '', x) for x in stuff])
['Normal String', 'Company', 'againnormal']

Without list comprehensions:

keep = []
for item in stuff:
    item = item.decode('utf8').encode('ascii', 'ignore')
    if item: