valenzio valenzio - 28 days ago 18
SQL Question

SQL: Use distinct on groups of similar data

Hello Mates I have the following problem in a Vertica database: I have a large Table

+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 1 | A | B |
| 2 | A | B |
| 3 | D | E |
| 2 | C | D |
| 1 | C | D |
+------+------+------+


As you can see I have redundant data, just taken on different dates (row 1 & 2 and row 4 & 5). So I would like a table that removes that redundant data by deleting the rows with the lower date, giving me a result like that:

+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 2 | A | B |
| 2 | C | D |
| 3 | D | E |
+------+------+------+


Using distinct would not work since it will delete rows randomly not considering the date, so I might end up with a table like this:

SELECT DISTINCT Col2, Col3 from Table


+------+------+------+
| Date | Col1 | Col2 |
+------+------+------+
| 2 | A | B |
| 1 | C | D |
| 3 | D | E |
+------+------+------+


which is not desired.

Is there anyway to accomplish that?
Thanks mates

Answer

Do a GROUP BY on your 2 columns and aggregate on the highest date:

SELECT MAX(Date), col1, col2
FROM table
GROUP BY Col1, Col2