rpsep2 rpsep2 - 6 months ago 16
PHP Question

de-dupe array of documents

I run a solr query, which returns a result set of jobs.

Some of the jobs are duplicates (but from different sources), which is decided based on if the job title, description and location are the same.

I want to loop through my result set, and combine any duplicated into one job, with that one job having multiple sources... something like:

original result:

$jobs = array(
array(
id => job1,
title => 'test',
description => 'test',
location => 'test',
source => 'source1',
),
array(
id => job2,
title => 'test',
description => 'test',
location => 'test'
source => 'source2',
),
array(
id => job3,
title => 'test',
description => 'test',
location => 'test',
source => 'source3',
),
array(
id => job4,
title => 'testing',
description => 'testing',
location => 'testing',
source => 'source1',
),
);


would become:

$jobs = array(
array(
id => job1,
title => 'test',
description => 'test',
location => 'test',
source => 'source1',
other_sources => array(
array(
id => job2,
title => 'test',
description => 'test',
location => 'test'
source => 'source2',
),
array(
id => job3,
title => 'test',
description => 'test',
location => 'test',
source => 'source3',
),
),
),
array(
id => job4,
title => 'testing',
description => 'testing',
location => 'testing',
source => 'source1',
),
);


how can I achieve this? Either in PHP or perhaps in the Solr query itself (I'm using Solarium to do my Solr querying)

Answer

How about something like this?

<?php

$result = array();
foreach ($jobs as $job) {

    if (!empty($result[$job['title']])) {
        $result[$job['title']]['other_sources'][] = $job;
    }
    else {
        $result[$job['title']] = $job;
    }

}

It initializes an empty array($result) and then loops through the job array. The empty array will store the jobs with the title being used as the key. If the job title does not exist in the result array, then it will add it. If the job title does exist in the job array, then it will append the job to an array inside the existing job (under the key 'other_sources')

Comments