neversaint - 7 months ago 29

Python Question

I have two examples of pair of strings

`YHFLSPYVY # answer`

LSPYVYSPR # prediction

+++******ooo

YHFLSPYVS # answer

VEYHFLSPY # prediction

oo*******++

As stated above I'd like to find the overlapping region (

`*`

`+`

`o`

How can I do it in Python?

I'm stuck with this

`import re`

# This is of example 1

ans = "YHFLSPYVY"

pred= "LSPYVYSPR"

matches = re.finditer(r'(?=(%s))' % re.escape(pred), ans)

print [m.start(1) for m in matches]

#[]

The answer I hope to get for example 1 is:

`plus_len = 3`

star_len = 6

ooo_len = 3

Answer

It's easy with `difflib.SequenceMatcher.find_longest_match`

:

```
from difflib import SequenceMatcher
def f(answer, prediction):
sm = SequenceMatcher(a=answer, b=prediction)
match = sm.find_longest_match(0, len(answer), 0, len(prediction))
star_len = match.size
plus_len = len(answer[:match.a] + answer[match.a + match.size:])
ooo_len = len(prediction[:match.b] + prediction[match.b + match.size:])
return (plus_len, star_len, ooo_len)
f('YHFLSPYVY', 'LSPYVYSPR') # (3, 6, 3)
f('YHFLSPYVS', 'VEYHFLSPY') # (2, 7, 2)
```