Lenwood Lenwood - 3 months ago 10
Python Question

Python Remove List of Strings from List of Strings

I'm trying to remove several strings from a list of URLs. I have more than 300k URLs, and I'm trying to find which are variations of the original. Here's a toy example that I've been working with.

URLs = ['example.com/page.html',
'www.example.com/in/page.html',
'example.com/ca/fr/page.html',
'm.example.com/de/page.html',
'example.com/fr/page.html']

locs = ['/in', '/ca', '/de', '/fr', 'm.', 'www.']


What I'd like to end up with is a list of the pages without the language or locations:

desired_output = ['example.com/page.html',
'example.com/page.html',
'example.com/page.html',
'example.com/page.html',
'example.com/page.html']


I've tried list comprehension and nested for loops, nothing has worked yet. Can anyone help?

# doesn't remove anything
for item in URLs:
for string in locs:
re.sub(string, '', item)

# doesn't remove anything
for item in URLs:
for string in locs:
item.strip(string)

# only removes the last string in locs
clean = []
for item in URLs:
for string in locs:
new = item.replace(string, '')
clean.append(new)

Answer

You have to assign the result of replace to item again:

clean = []
for item in URLs:
    for loc in locs:
        item = item.replace(loc, '')
    clean.append(item)

or in short:

clean = [
    reduce(lambda item,loc: item.replace(loc,''), [item]+locs)
    for item in URLs
]
Comments