mattltm mattltm - 5 months ago 19
PHP Question

Remove duplicate address lines from string in PHP

I'm interacting with a JSON API that provides an address in response to a query. I'm then putting the returned address elements into a MYSQL database table.

The data is returned as AddressLine1, AddressLine2, Region, Postcode. The problem I have is that the quality of the data is pretty low and a lot of the AddressLine1 data is duplicated within the element. For example, a typical return may be

123 My House 123 My House, My Road

I'm trying to work out how I can remove the second occurrence of "123 My House" without removing the "My" from the "My Road" part.

I have tried all sorts of regex but my regex fu is weak! I've also tried implode but all I can manage is to remove all duplicate words apart from the first instance which is no help to me.

I guess I need some way of keeping the first occurrence of a word and removing all others using the comma as a separator for each part so what I'll end up with is...

123 My House, My Road

Can anyone point me in the right direction? I guess I need to split the string into an array at the comma then check each part of the array for duplicates and remove them then reassemble the array back into a string? Maybe?

I've managed to do it like this...

$string = "123 My House 123 My House, My Road";

$split = (explode(',', $string));

foreach($split as $section){
$cleaned = implode(' ',array_unique(explode(' ', $section)));
if (!empty($result)){
$result = $result." ,";
$result = $result.$cleaned;

echo $result;

Can anyone provide a more elegant solution?


Your question is pretty specific and I don't know how an answer from this question will serve your project in the long term, however, I tried to give a string manipulation solution for this particular case.

You should try to make your code more precise and intelligent so it doesn't store those doubles in the first place.

Anyway, the code you should use for the replacements is as follows:

$str = '123 My House 123 My House, My Road';
$arr = explode(', ', $str);
$arr[0] = implode(' ', array_unique(explode(' ', $arr[0])));

echo $str.'<br>'; // 123 My House 123 My House, My Road
echo implode(', ', $arr); // 123 My House, My Road

Step by step explanation:

  1. This simply breaks down the address in two parts on the ,
  2. Then it breaks down the spaces in the first part of the previous break/explosion.
  3. Removes duplicate values with array_unique
  4. Glues the first part together with spaces again
  5. Glues everything back together with the comma ,

I hope this helps.