Ahsan Mukhtar Ahsan Mukhtar - 2 months ago 19
MySQL Question

Deleting repeating values in database

I have a MySql table which is populating daily with price values. Each day it records an entry even if the price hasn't changed. I want to delete some rows that repeat too much. I want to keep the first price and the last price before there is a price change.

Example 1)

id name price date
1 Product1 $6 13/07/2017
2 Product1 $6 14/07/2017
3 Product1 $6 15/07/2017
4 Product1 $7 16/07/2017
5 Product1 $6 17/07/2017
6 Product1 $6 18/07/2017
7 Product1 $6 19/07/2017


From that list records with id 2 and 6 should be deleted with the following result:

id name price date
1 Product1 $6 13/07/2017
3 Product1 $6 15/07/2017
4 Product1 $7 16/07/2017
5 Product1 $6 17/07/2017
7 Product1 $6 19/07/2017


Example 2)

id name price date
1 Product1 $6 13/07/2017
2 Product1 $6 14/07/2017
3 Product1 $6 15/07/2017
4 Product1 $6 16/07/2017
5 Product1 $6 17/07/2017
6 Product1 $6 18/07/2017
7 Product1 $6 19/07/2017


Here there is no price change so I can delete all records from 2 to 6:

id name price date
1 Product1 $6 13/07/2017
7 Product1 $6 19/07/2017


Id is not supposed to be one incremental and also date is not daily on daily basis.

Answer Source

This is the second answer I've submitted for this question, but I think I've finally got it this time:

DELETE FROM products WHERE id IN (
    SELECT id_to_delete
    FROM (
        SELECT
            t0.id AS id_to_delete,
            t0.price,
            (
                SELECT t1.price
                FROM products AS t1
                WHERE (t0.date < t1.date)
                    AND (t0.name = t1.name)
                ORDER BY t1.date ASC
                LIMIT 1
            ) AS next_price,
            (
                SELECT t2.price
                FROM products AS t2
                WHERE (t0.date > t2.date)
                    AND (t0.name = t2.name)
                ORDER BY t2.date DESC
                LIMIT 1
            ) AS prev_price
        FROM products AS t0
        HAVING (price = next_price) AND (price = prev_price)
    ) AS t
)

This is a modified version of the answer from @vadim_hr.

Edit: Below is a different query that filters on JOIN instead of subquery. JOIN may be faster than the former query (above) for large datasets, but I'll leave the performance testing up to you.

http://sqlfiddle.com/#!9/ee0655/8

SELECT M.id as id_to_delete
FROM
(
    SELECT
        *,
        (@j := @j + 1) AS j
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS mmm
    JOIN
    (SELECT @j := 1) AS mm
) AS M     -- the middle table
JOIN
(
    SELECT
        *,
        (@i := @i + 1) AS i
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS lll
    JOIN
    (SELECT @i := 0) AS ll
) AS L     -- the left table
ON M.j = L.i
    AND M.name = L.name
    AND M.price = L.price
JOIN
(
    SELECT
        *,
        (@k := @k + 1) AS k
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS rrr
    JOIN
    (SELECT @k := 2) AS rr
) AS R     -- the right table
ON M.j = R.k
    AND M.name = R.name
    AND M.price = R.price

Both queries accomplish the same end, and they both assume that rows are unique per name and date (as explained in the comment below).