hyeyoung hyeyoung - 1 year ago 346
SQL Question

SQL - Redshift remove duplicate rows without primary key

I have a big redshift database.
the records do not have unique key.

I'd like to convert the below sql to postgresql.

SELECT *,ROW_NUMBER() OVER (partition BY column1, column2, column3) AS rnum
FROM tablename ) t1
WHERE t1.rnum > 1;

perform the above sql,the following errors appear :

Amazon Invalid operation: syntax error at or near "(";

please :'( .

Answer Source

In Postgres, you can do this using ctid. This is a system "column" that physically identifies each row.

The idea is:

delete from tablename
    where ctid not in (select min(t2.ctid)
                       from tablename t2
                       group by column1, column2, column3

I am not sure if Redshift supports ctid. But then again, despite the tags, your question is explicitly about Postgres.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download