Nicolas NOEL Nicolas NOEL - 4 months ago 25
Python Question

Longest common substring from more than two strings - Python

I'm looking for a python library for finding the longest common substring from a set of python strings.

I'have read that it exist to way to solve this problem :
- one using suffix trees
- the other using dynamic programming.

The method implemented is not important. Otherwise, it is important to have a implementation that can be use for a set of strings and not only two strings

Answer

These paired functions will find the longest common string in any arbitrary array of strings:

def long_substr(data):
    substr = ''
    if len(data) > 1 and len(data[0]) > 0:
        for i in range(len(data[0])):
            for j in range(len(data[0])-i+1):
                if j > len(substr) and is_substr(data[0][i:i+j], data):
                    substr = data[0][i:i+j]
    return substr

def is_substr(find, data):
    if len(data) < 1 and len(find) < 1:
        return False
    for i in range(len(data)):
        if find not in data[i]:
            return False
    return True


print long_substr(['Oh, hello, my friend.',
                   'I prefer Jelly Belly beans.',
                   'When hell freezes over!'])

No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.

EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.

def long_substr(data):
    substr = ''
    if len(data) > 1 and len(data[0]) > 0:
        for i in range(len(data[0])):
            for j in range(len(data[0])-i+1):
                if j > len(substr) and all(data[0][i:i+j] in x for x in data):
                    substr = data[0][i:i+j]
    return substr

Hope this helps,

Jason.

Comments