user3218971 user3218971 - 6 months ago 13
Python Question

Is there any way to compare two string similarity quantitatively

I have two strings say:

s_1 = "This is a bat"
s_2 = "This is a bag"

in a Qualitative manner they could be similar (1) or not (0), in the above case they are not similar because of "g", while in quantitative manner i can see certain amount of dissimilarity is there how can i calculate this dissimilarity of one latter "g" from s_1 to s_2 using python.

I write down one simple code:

Per_deff = float(((Number_of_mutated_sites)/len(s_1))*100)

This code tells us "per_deff" between two string of identical length, what if they are not in identical length. How can i solve my problem.


Something that you want is similar to Levenshtein Distance. It gives you distance between two strings even if their lengths are not equal.

If two strings are exactly same then distance will be 0 and if they are similar then distance will be less.

Sample Code from Wikipedia:

// len_s and len_t are the number of characters in string s and t respectively
int LevenshteinDistance(string s, int len_s, string t, int len_t)
{ int cost;

  /* base case: empty strings */
  if (len_s == 0) return len_t;
  if (len_t == 0) return len_s;

  /* test if last characters of the strings match */
  if (s[len_s-1] == t[len_t-1])
      cost = 0;
      cost = 1;

  /* return minimum of delete char from s, delete char from t, and delete char from both */
  return minimum(LevenshteinDistance(s, len_s - 1, t, len_t    ) + 1,
                 LevenshteinDistance(s, len_s    , t, len_t - 1) + 1,
                 LevenshteinDistance(s, len_s - 1, t, len_t - 1) + cost);