user3218971 - 3 months ago 6x

Python Question

I have two strings say:

`s_1 = "This is a bat"`

s_2 = "This is a bag"

in a Qualitative manner they could be similar (1) or not (0), in the above case they are not similar because of "g", while in quantitative manner i can see certain amount of dissimilarity is there how can i calculate this dissimilarity of one latter "g" from s_1 to s_2 using python.

I write down one simple code:

`Per_deff = float(((Number_of_mutated_sites)/len(s_1))*100)`

This code tells us "per_deff" between two string of identical length, what if they are not in identical length. How can i solve my problem.

Answer

Something that you want is similar to **Levenshtein Distance**. It gives you distance between two strings even if their lengths are not equal.

If two strings are exactly same then distance will be 0 and if they are similar then distance will be less.

Sample Code from Wikipedia:

```
// len_s and len_t are the number of characters in string s and t respectively
int LevenshteinDistance(string s, int len_s, string t, int len_t)
{ int cost;
/* base case: empty strings */
if (len_s == 0) return len_t;
if (len_t == 0) return len_s;
/* test if last characters of the strings match */
if (s[len_s-1] == t[len_t-1])
cost = 0;
else
cost = 1;
/* return minimum of delete char from s, delete char from t, and delete char from both */
return minimum(LevenshteinDistance(s, len_s - 1, t, len_t ) + 1,
LevenshteinDistance(s, len_s , t, len_t - 1) + 1,
LevenshteinDistance(s, len_s - 1, t, len_t - 1) + cost);
}
```

Source (Stackoverflow)

Comments