timakro timakro - 1 month ago 9
MySQL Question

From every set of rows with the same value in column A delete all rows but the row with the highest value in column B

I'm searching for a SQL command that does the following:

From every set of rows with the same value in column A delete all rows but the row with the highest value in column B. If multiple rows in the same set have the same value in column B choose one of them at random.

Additional notes




  • The column format should not be modified nor additional tables should be required to achieve the desired result.

  • The table has only two columns from which none is primary, unique or multiple occurrences key.

  • The query should work well with bigger datasets i.e. the running time should be proportional to the number of rows in the table (not quadratic/exponential).



Example:



Initial state:

+---+---+
| A | B |
+---+---+
| x | 1 |
| x | 2 |
| x | 2 |
| y | 3 |
+---+---+


Desired result:

+---+---+
| A | B |
+---+---+
| x | 2 |
| y | 3 |
+---+---+

Answer

Use a JOIN with a subquery that gets the maximum B for each A, and make the JOIN condition match the rows where B doesn't match this.

DELETE t1
FROM Table AS t1
JOIN (
    SELECT A, MAX(B) AS maxB
    FROM Table
    GROUP BY A) AS t2
ON t1.A = t2.A AND t1.B != maxB