user8264 user8264 - 2 years ago 104
Python Question

How to find duplicate element and remove them in the list?

I have two lists as follows:

name =[A, B , C , D , E , F ]
cls=[1, 2 , 3 , 2 , 4 , 1 ]
score=[0.1, 0.2 , 0.5 , 0.3 , 1 , 0.8 ]


It means A belongs to class 1 and its score 0.1, B belongs to class 2 and its score is 0.2, and so on.

I am looking for a method to find object has the same class, and remove the object if its score is smaller than another object in the class (
cls
). So, My expected result is

name =[C , D , E , F ]
cls =[3 , 2 , 4 , 1 ]
score=[0.5 ,0.3 , 1 , 0.8 ]


The
name
,
cls
and
score
are list type. How can I implement it in python? Thanks

This is what I did

name_clean=[]
cls_clean=[]
score_clean=[]
for i in range(len(cls)-1):
cls_i=cls[i]
max_index = -1
for j in range(i+1,len(cls)):
cls_j = cls[j]
if (cls_i==cls_j):
if (score[i]<=score[j]):
max_index=j
else:
max_index=i
if (max_index>=0):
name_clean.append(name[max_index])
cls_clean.append(cls[max_index])
score_clean.append(score[max_index])
else:
name_clean.append(name[i])
cls_clean.append(cls[i])
score_clean.append(score[i])

Answer Source

Note that you cannot use class as a variable name because it's a reserved keyword in Python.

Instead of using 3 lists I would consider using one list containing namedtuples or a Table, e.g. pandas.DataFrame.

However since you have it as 3 lists I would do it like this:

Get the highest score for each class and store it in a dictionary

highest_scores = {}
for c, s in zip(cls, score):
    current_max = highest_scores.get(c, None)
    if current_max is None or current_max < s:  # not present or smaller
        highest_scores[c] = s

Then iterate over the lists again and only keep those that have a score that is equal to the stored score for that class:

new_name = []
new_cls = []
new_score = []
for n, c, s in zip(name, cls, score):
    if s == highest_scores[c]:
        new_name.append(n)
        new_cls.append(c)
        new_score.append(s)

Which gives:

>>> new_name
['C', 'D', 'E', 'F']
>>> new_cls
[3, 2, 4, 1]
>>> new_score
[0.5, 0.3, 1, 0.8]

Note that this would keep all "highest scores" for each class, so if you have the same class and the same score this would keep both. To fix that you could remove the key from the dictionary as soon as you found the first.

for n, c, s in zip(name, cls, score):
    if c in highest_scores and s == highest_scores[c]:
        new_name.append(n)
        new_cls.append(c)
        new_score.append(s)
        del highest_scores[c]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download