hyeyoung hyeyoung - 1 month ago 15
SQL Question

SQL - Redshift remove duplicate rows without primary key

I have a big redshift database.
the records do not have unique key.

I'd like to convert the below sql to postgresql.

DELETE FROM (
SELECT *,ROW_NUMBER() OVER (partition BY column1, column2, column3) AS rnum
FROM tablename ) t1
WHERE t1.rnum > 1;


perform the above sql,the following errors appear :


Amazon Invalid operation: syntax error at or near "(";


please :'( .

Answer

In Postgres, you can do this using ctid. This is a system "column" that physically identifies each row.

The idea is:

delete from tablename
    where ctid not in (select min(t2.ctid)
                       from tablename t2
                       group by column1, column2, column3
                      );

I am not sure if Redshift supports ctid. But then again, despite the tags, your question is explicitly about Postgres.