Shan Shan -4 years ago 86
C# Question

How to compare two strings and find the percentage of similarity

The below code does the job, but takes lot of time. Am comparing the contents of two HTML files which I have saved as a string in MongoDB already. And the length of the string is around 30K+ and have around 250K+ records to compare. Thus the job is taking quite lot of time.

Is there any easier way or plugin to use and is quite fast too?

private int ComputeCost(string input1, string input2)
{
if (string.IsNullOrEmpty(input1))
return string.IsNullOrEmpty(input2) ? 0 : input2.Length;

if (string.IsNullOrEmpty(input2))
return string.IsNullOrEmpty(input1) ? 0 : input1.Length;

int input1Length = input1.Length;
int input2Length = input2.Length;

int[,] distance = new int[input1Length + 1, input2Length + 1];

for (int i = 0; i <= input1Length; distance[i, 0] = i++) ;
for (int j = 0; j <= input2Length; distance[0, j] = j++) ;

for (int i = 1; i <= input1Length; i++)
{
for (int j = 1; j <= input2Length; j++)
{
int cost = (input2[j - 1] == input1[i - 1]) ? 0 : 1;

distance[i, j] = Math.Min(
Math.Min(distance[i - 1, j] + 1, distance[i, j - 1] + 1),
distance[i - 1, j - 1] + cost);
}
}

return distance[input1Length, input2Length];
}

Answer Source

As per @Kay Lee, made the function static and used HTML agility pack to remove unnecessary data. And saw a good performance improvement.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download