Maxline Maxline - 3 months ago 15
Python Question

Matching elements of two lists which are almost the same

Let's suppose I have two lists of strings. I want to reorder the second list by getting the element that most resembles the corresponding element of the first list.

I already do this :

import difflib

list1 = ['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee', 'ffff', 'gggg', 'hhhh', 'iiii', 'jjjj']
list2 = ['eeez', 'fffz', 'dddz', 'cccz', 'iiiz', 'jjjz', 'aaaz', 'gggz', 'hhhz', 'bbbz']

len = len(list1)
i = 0
while i < len:
j = 0
while j < len:
if difflib.SequenceMatcher(None, list1[i], list2[j]).ratio() > 0.5:
eltMove = list2.pop(j)
list2.insert(i, eltMove)
break
j += 1
i += 1

print(list2)


Output :

['aaaz', 'bbbz', 'cccz', 'dddz', 'eeez', 'fffz', 'gggz', 'hhhz', 'iiiz', 'jjjz']


But it doesn't work in some cases where there is an element in list2 that match a bit with an element in list1, which break the loop and skip next elements even if they can match better.

Answer
while i  < len:
    j = 0
    new_l = []
    while j < len:
        new_l.append(difflib.SequenceMatcher(None, list1[i], list2[j]).ratio())
        j += 1
    ind = new_l.index(max(new_l))
    eltMove = list2.pop(ind)
    list2.insert(i, eltMove)
    i += 1

It stores the ratios and then calculates the max, finds out the index of the max value and then pops/inserts.

Hope this is what you needed

for i, a in enumerate(list1):
    new_l = [difflib.SequenceMatcher(None, a, b).ratio() for b in list2]
    ind = new_l.index(max(new_l))
    eltMove = list2.pop(ind)
    list2.insert(i, eltMove)

Shortened out the code

Considering @Jose Raul Barreras's reply, the appropriate modification of the above would be:

tmp = []
for i, a in enumerate(list1):
    new_l = [difflib.SequenceMatcher(None, a, b).ratio() for b in list2]
    ind = new_l.index(max(new_l))
    eltMove = list2.pop(ind)
    tmp.append(eltMove)

>>> tmp
['aaaz', 'bbbz', 'cccz', 'dddz', 'eeez', 'fffz', 'gggz', 'hhhz', 'iiiz', 'jjjz', 
 'aaaz', 'bbbz', 'cccz', 'dddz', 'eeez', 'fffz', 'gggz', 'hhhz', 'iiiz', 'jjjz']
Comments