PrestonGarvey22 PrestonGarvey22 - 2 months ago 17
PHP Question

Google like "Did you mean ?" search suggestion using Levenshtein Distance

I'm building a simple google like "did you mean ?" search suggestion using Levenshtein Distance algorithm. I've managed to get it working but I have problem with returning the result.

For example, if the keywords are "How to use fasebuk and twitster ?" I want to replace only the incorrect words so the suggestion result would be "How to use facebook and twitter ?"

How to achieve that kind of suggestion ?

Below are what I've done so far :

public function searchsuggestion(Request $request)
{
$allwords = array('facebook','orkut','twitter','yahoo', 'microsoft','paypal','google','bing','msn');
$shortest = -1;
$keywords = trim($request->input('q'));
if (! empty($keywords)) {
foreach ($allwords as $word) {
$lev = $this->LevenshteinDistance($keywords, $word);
if ($lev == 0) {
$closest = $word;
$shortest = 0;
break;
}
if ($lev <= $shortest || $shortest < 0) {
$closest = $word;
$shortest = $lev;
}
}
if ($shortest == 0) {
$result = '';
} else {
$result = $closest;
}
return view('posts/search', compact('result'));
}
return redirect()->route('latestposts');
}


I'm a newbie developer, any help appreciated. Thanks!

Answer Source

This is a good question. To do this you will need to process each word in the input string individually. That way you can replace individual words in the input string with the words from your dictionary.

Depending on your dictionary, you may need to do further processing on each hit to prevent false positives. When I started experimenting with your code, most of the words in the input were replaced with "msn" when using the Levenshtein distance alone.

I put together a test case that you can run from the command line to test the routine. In addition to processing each word individually so we can do the replacements, I am getting metaphone keys for both the original word and the dictionary word, then using similar_text to get a percentage of similarity between the two. There are threshold variables that allow you to set a ceiling threshold for Levenshtein distance, set a length limit on the metaphone keys, and set a minimum percentage for similarity of metphone keys.

<?php
class Request
{
    private $_data = [];

    public function __construct(array $data = [])
    {
        $this->_data = $data;
    }

    public function input($key, $default = null)
    {
        return (array_key_exists($key, $this->_data)) ? $this->_data[$key] : $default;
    }
}

function searchsuggestion(Request $request)
{
    $allwords = array('facebook', 'orkut', 'twitter', 'yahoo', 'microsoft', 'paypal', 'google', 'bing', 'msn');
    $keywords = trim($request->input('q'));

    $levThreshold       = 3;
    $metaphoneLength    = 5;
    $metaphoneThreshold = 60;

    if (!empty($keywords))
    {
        $keywordArr = explode(' ', $keywords);
        $matchArr   = [];

        foreach ($keywordArr as $currKeyword)
        {
            $shortest = -1;
            $closest  = null;
            foreach ($allwords as $word)
            {
                $lev = levenshtein($currKeyword, $word);

                if ($lev == 0)
                {
                    break;
                }

                if ($lev <= $shortest || $shortest < 0)
                {
                    $closest  = $word;
                    $shortest = $lev;
                }
            }

            if (!empty($closest) && $shortest <= $levThreshold)
            {
                $origSoundex  = metaphone($currKeyword, $metaphoneLength);
                $foundSoundex = metaphone($closest, $metaphoneLength);

                similar_text($origSoundex, $foundSoundex, $percentage);

                if ($percentage > $metaphoneThreshold)
                {
                    $matchArr[$currKeyword] = $closest;
                }
            }
        }

        if (!empty($matchArr))
        {
            $origWords    = array_keys($matchArr);
            $replaceWords = array_values($matchArr);

            return str_replace($origWords, $replaceWords, $keywords);
        }

        return null;
        //return view('posts/search', compact('result'));
    }
    //return redirect()->route('latestposts');
}

$request = new Request(['q' => 'How to use fasebuk and twitster?']);

$response = searchsuggestion($request);

print $response . "\n";