Dexters Dexters - 1 year ago 77
Python Question

How does the python difflib.get_close_matches() function work?

The following are two arrays:

import difflib
import scipy
import numpy

a1=numpy.array(['','','','',''], dtype='|S15')
b1=numpy.array(['','','','','', ''],dtype='|S15')



['', '']

be the closest match for

I looked at the documentation where they have specified about some floating type weights but no information on algorithm use.

I am in need to find if the absolute difference between the last two octet is 1 (provided the first three octets are same).

So I am finding the closest string first and then checking that closest string for the above condition.

Is there any other function or way to achieve this? Also how does

doesnt seem to have such a manipulation for ips.

Answer Source

Well, there is this part in the docs explaining your issue:

This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.

For getting the results you are expecting you could use the Levenshtein_distance.

But for comparing IPs I would suggest to use integer comparison:

>>> parts = [int(s) for s in ''.split('.')]
>>> parts2 = [int(s) for s in ''.split('.')]
>>> from operator import sub
>>> diff = sum(d * 10**(3-pos) for pos,d in enumerate(map(sub, parts, parts2)))
>>> diff

You can use this style to create a compare function:

from functools import partial
from operator import sub

def compare_ips(base, ip1, ip2):
    base = [int(s) for s in base.split('.')]
    parts1 = (int(s) for s in ip1.split('.'))
    parts2 = (int(s) for s in ip2.split('.'))
    test1 = sum(abs(d * 10**(3-pos)) for pos,d in enumerate(map(sub, base, parts1)))
    test2 = sum(abs(d * 10**(3-pos)) for pos,d in enumerate(map(sub, base, parts2)))
    return cmp(test1, test2)

base = ''
test_list = ['','','',
             '','', '']
sorted(test_list, cmp=partial(compare_ips, base))
# yields:
# ['', '', '', '', 
#  '', '']