Traxstar Traxstar - 6 months ago 32
PHP Question

PHP Filter Array for double Elements

i've got a question.
I got an array that is filled with external Links like:

www.google.de
www.google.com/test


and so on.
No i would like to get the array filltered.
If there are links in the Array like this:

www.google.de
www.google.de/test
www.google.de/fuuuu


I only want to get the www.google.de link and filter the rest out of it.
I startet to use array_diff_key but this is not working like it should.
Here is my snippet.

$d_array = array_diff_key($externalArray, array_unique($externalArray));


Thanks for any help.
Greats,
Traxstar

Answer

Finally, I did it :

$arr = [
    'www.google.de',
    'http://www.google.de/test',
    'www.google.de/fufufufu',
    'www.google.com/cctvvmb',
    'https://www.google.com/',
    'google.co.uk/hello',
];


// based on http://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain
function get_domain($url)
{
    if(preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $url, $regs))
    {
        return $regs['domain'];
    }

    return false;
}

function get_duplicated_domains($arr)
{
    $domains = [];

    // looping the array, processing all of it
    foreach($arr as $url)
    {
        // lower text
        $url = strtolower($url);

        // removing eventual http & https
        $url = str_replace('http://', '', $url);
        $url = str_replace('https://', '', $url);

        // replacing with the string before the first slash
        $url = explode('/', $url);

        // extracting top level domain
        $url = get_domain($url[0]);

        // Registering domain in $domains array or incrementing it
        if(array_key_exists($url, $domains))
        {
            $domains[$url]++;
        }
        else
        {
            $domains[$url] = 0;
        }
    }

    // gathering data
    return array_keys(array_filter($domains));
}

$res = get_duplicated_domains($arr);

Result is :

Array
(
    [0] => google.de
    [1] => google.com
)

What does the script doing ?

1 - Looping the array

  • 1.1 - Lowering url string to prevent http mismatch with Http for exemple

  • 1.2 - Removing http:// & https:// from the strings, to make it all the same format

  • 1.3 - Extracting top level domain name

  • 1.4 - Registering or incrementing the $domains array on the extracted top level domain name

2 - Filtering the array (0, null, false, empty strings, are removed, that's why I register domain name with 0 and not 1) to keep only the 'doubt' domains (the ones that are present more than one time in the array)

3 - Then getting the keys of the array (because the keys are the domain names)

Btw I'm running PHP 5.6.2