Tanmaya Meher Tanmaya Meher - 2 months ago 5
Python Question

finding non-unique elements in list not working

I wanted to find the non-unique elements in the list, but I am not able to figure out why this is not happening in the below code section.

>>> d = [1, 2, 1, 2, 4, 4, 5, 'a', 'b', 'a', 'b', 'c', 6,'f',3]
>>> for i in d:
... if d.count(i) == 1:
... d.remove(i)
...
>>> d
[1, 2, 1, 2, 4, 4, 'a', 'b', 'a', 'b', 6, 3]


6 and 3 should have been removed.
where as, if I use

d = [1, 2, 1, 2, 4, 4, 5, 'a', 'b', 'a', 'b', 'c']


I am getting correct answer. Please explain what is happening, I am confused !!!

I am using python 2.7.5.

Answer

Removing elements of a list while iterating over it is never a good idea. The appropriate way to do this would be to use a collections.Counter with a list comprehension:

>>> from collections import Counter
>>> d = [1, 2, 1, 2, 4, 4, 5, 'a', 'b', 'a', 'b', 'c', 6, 'f', 3]
>>> [k for (k,v) in Counter(d).iteritems() if v > 1]
['a', 1, 2, 'b', 4]

If you want keep the duplicate elements in the order in which they appear in your list:

>>> keep = {k for (k,v) in Counter(d).iteritems() if v > 1}
>>> [x for x in d if x in keep]
[1, 2, 1, 2, 4, 4, 'a', 'b', 'a', 'b']

I'll try to explain why your approach doesn't work. To understand why some elements aren't removed as they should be, imagine that we want to remove all bs from the list [a, b, b, c] while looping over it. It'll look something like this:

+-----------------------+
|  a  |  b  |  b  |  c  |
+-----------------------+
   ^ (first iteration)

+-----------------------+
|  a  |  b  |  b  |  c  |
+-----------------------+
         ^ (next iteration: we found a 'b' -- remove it)

+-----------------------+
|  a  |     |  b  |  c  |
+-----------------------+
         ^ (removed b)

+-----------------+
|  a  |  b  |  c  |
+-----------------+
         ^ (shift subsequent elements down to fill vacancy)

+-----------------+
|  a  |  b  |  c  |
+-----------------+
               ^ (next iteration)

Notice that we skipped the second b! Once we removed the first b, elements were shifted down and our for-loop consequently failed to touch every element of the list. The same thing happens in your code.

Comments