view raw
Nic Nic - 9 months ago 23
PHP Question

Merge MySQl rows as result where data is the newest

I have a problem where I have multiple rows in a database that contain the same email. The duplicated rows are due to contacts wishing to update their information and instead of updating it would insert as a new row.

What I would like to do is merge all duplicate rows given an email and merge the data together as a result returned in PHP.

There is a specific:

  • For x amount of rows the value that gets merged from a given column should be the row with the highest
    where the column is not empty.

For example, if I have these rows:

id email prefix first_name
1 Mr. Bob
2 Bob
3 Bobby
4 Mr Bobby
5 Bob

I want the merged row to become:

email prefix first_name Mr Bob

Since the row with the highest
where the prefix column is not empty is
id = 4
so the value of prefix in that row is chosen to be merged to the final result.

Likewise, the contact changed his name from Bob, to Bobby, and back to Bob, however; since the highest
row contains
that is the value that is merged.

Note that there are more columns and these are just a brief example.

Here is my SQL statement:

$this->db->select('company, title, address_line1, address_line2, address_line3, city, state/prov, country, postal_code');
$this->db->where('email', $email);

If anybody could help me accomplish this it would be much appreciated. If this is possible in SQL that would be awesome but if not a PHP solution could be used too.


In MySQL, one method uses group_concat():

select email
       substring_index(group_concat(prefix order by (prefix is not null) desc, id desc separator '|'
                                   ), '|', 1) as prefix,
       substring_index(group_concat(first_name order by (first_name is not null) desc, id desc separator '|'
                                   ), '|', 1) as first_name,
       . . .
from t
group by email;

The approach is that you are concatenating the values together and then extracting the first element.

Some notes:

  • There is a (configurable) maximum length for the intermediate string for group_concat(). You might need to increase its size.
  • All data is converted to strings, although you can convert back to the appropriate data type.
  • The separator character ('|' in this case) should not appear in any value.

An alternative method uses correlated subqueries:

       (select t2.prefix
        from t t2
        where = and t2.prefix is not null
        order by id desc
        limit 1
       ) as prefix,
       (select t2.first_name
        from t t2
        where = and t2.first_name is not null
        order by id desc
        limit 1
       ) as first_name,
       . . . 
from (select distinct email from t) e;

This method has the advantage of preserving the original types.