Menixm Menixm - 1 month ago 6
Python Question

Removing elements from a list preserving order and one copy of duplicates

I have two large lists, L1 and L2. L2 is a subset of L1. L1 and L2 both can contain duplicate terms, but I can fairly easily detect/remove/save those if I need to.

I want to write a function that removes all elements from L1 that are also in L2. HOWEVER, if an element in L1 repeats itself (is a duplicate in L1) and is also present in L2, I want to retain one copy of it in the resulting list.

For example:

l1 = [1, 2, 2, 3, 4]
l2 = [2, 4]
l3 = question_function(l1, l2)

L3 should be:

[1, 2, 3]

I also want to preserve order from L1 to L3. (The remaining "copy" in l3 of the duplicates in l1 must be in a similar location to the duplicates in l1). The actual elements in the lists I am working with are strings, if that is relevant for ideas about sorting and such.

I've tried getting a list of all duplicates in L1, then removing all elements in L2 from L1, then appending the list of all duplicates back onto L1, but this does not preserve order. l3 ends up looking like:

[1, 3, 2]

I'd like to avoid looping over each list if possible, but is that the only way I can solve this? Any insight into how to approach this would be great.

Answer Source

First of all, do not alter l1 as you iterate over it: that will throw off your iteration indexing and give undesirable results.

Looking at the logic another way, l3 is composed of

  • l1 elements that do not appear in l2
  • l1 elements that do appear in l2, but are in l1 more than once

You can attack this one of two ways: (1) iterate over l1 and check these conditions for each element; (2) iterate over l2, identifying elements to remove; then build l3 from l1, removing elements and reducing remaining duplicates as needed.

You can use the count method to determine whether an item appears more than once, as in

if l1.count(item) > 1:

Detailed design and coding are left as an exercise for the student. :-)