zihan meng zihan meng - 1 month ago 6
Python Question

How to remove duplicate short phrase in a list and only keep the longer phrase

I have a list of phrase and most of them are duplicated.

buffer not
buffer not available
code 000001
error pxa_no_shared_memory
error pxa_no_shared_memory occurred
error pxa_no_shared_memory occurred short
error pxa_no_shared_memory occurred short dump
failed return
failed return code
failed return code 000001
for pxa
for pxa buffer
for pxa buffer not
for pxa buffer not available
initialization runt
initialization runt failed
initialization runt failed return
initialization runt failed return code
initialization runt failed return code 000001
memory for
memory for pxa
memory for pxa buffer
memory for pxa buffer not
memory for pxa buffer not available
not available
occurred short
occurred short dump


If the short phrase occurs in the longer phrase, like "buffer not" also occurs in " buffer not available" and "memory for pxa buffer not available",I WANT TO KEEP THE "memory for pxa buffer not available"

I want my output look like this

error pxa_no_shared_memory occurred short dump
initialization runt failed return code 000001
memory for pxa buffer not available


Thank you in advance and really appreciate for your help!

mkj mkj
Answer

Not sure about efficiency but:

with open('lines.txt') as f:
    original = f.read().splitlines()
    results = set(original)
    for o in original:
        for r in set(results):
            if o != r:
                try:
                    if o in r:
                        results.remove(o)
                    elif r in o:
                        results.remove(r)
                except KeyError:
                    pass

print('\n'.join(results))