Giacomo Tecya Pigani Giacomo Tecya Pigani - 5 months ago 6x
PHP Question

Redirect mispelled search php

I have a website where I index information about movies. To find a movie the user can input the title of the movie and this query will be sent to the database:

SELECT IMDB, Name, Year, Views
FROM Movies
WHERE Name LIKE '%$search%'

I used the "similar_text" function to solve little mistakes. For example if the name is "Pulp Fiction" and the user types: "Pulp Foction", since I don't get any results I run this little code for every movie.

similar_text($search, $Name, $percent);

if ($percent > $ValMax) {
$ValMax = $percent;
$PosMax = $i;

I'm using this code since I only have a couple hundred movies in the database and in the future it's going to be at max about twenty thousand. Also every movie as a title of an average of 20 characters, without considering movies like Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb

The real problem starts with a query used by an user: "Capitan America" (italian for Captain America) where the name of the movie was "Captain America – Il primo vendicatore"

Using the Levenshtein function I got horrible results where with the similar_text they were at least decent. You can see the full results of that query here. I made up this page to see for each search what the levenshtein and the similar_text would be.

The movie that would have been selected if I didn't place a 60% blockage would have been "C'era una volta in America" (Once upon a time in America).

However "Capitan America" and "Captain America" are pretty similar so I was wondering if there was a way to maybe check for each word. I've also heard about Lucene but I didn't know how to start from and If it was suited for my needs.

Thanks :)



You could use SOUNDS LIKE (available as a MySQL string function):

SELECT IMDB, Name, Year, Views FROM Movies WHERE Name SOUNDS LIKE '%$search%'

You might want to benchmark the performance of this on large tables as the preceding wildcard in '%$search%' means that your statement will not be able to avail of any indexes.


Another possible solution involves creating a custom levenshtein function for MySQL. Here, you can find an example of one:

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) ) 
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; 
    DECLARE s1_char CHAR; 
    -- max strlen=255 
    DECLARE cv0, cv1 VARBINARY(256); 
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; 
    IF s1 = s2 THEN 
      RETURN 0;