L Xandor L Xandor - 6 months ago 9
SQL Question

Oracle SQL - removing duplicates from a messy query

I have a long complex (for me at least) query which returns the exact data I need. Only problem is some ids are being duplicated due to one join in the query having a many to one relationship. What I'd like to do is just take the output from the query, and select one row per unique ID. It really doesn't matter how this row is selected from the duplicate IDs, I just need it to be reduced to one row per ID in the simplest way possible.

In other words I have a long messy query that gives me the data I need with neatly named columns etc BUT duplicated IDs:

ID column1 column2 column2... column50
111 data data data ... data
111 data data data ... data
112 data data data ... data
113 data data data ... data
113 data data data ... data
113 data data data ... data


And I want to take this output from the query and reduce it to

ID column1 column2 column2... column50
111 data data data ... data
112 data data data ... data
113 data data data ... data


The challenge I'm having is that I don't know how to take the entire output from a query and do stuff with it (like remove duplicates) without first writing it to a table, and I'm not able to write to a table.

MT0 MT0
Answer

If the data is identical then the simplest way is to wrap your messy query in another query where you just use SELECT DISTINCT. like this:

SELECT DISTINCT *
FROM   (
  your_messy_query
)

If it is not identical then you can use:

SELECT *
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY something ) AS rn
  FROM   (
    your_messy_query
  ) t
)
WHERE rn = 1;

The ORDER BY something could be:

  • a specific set of columns if you how to retrieve the most appropriate data;
  • or you could use ORDER BY ROWNUM if you just want to take the first row of each group;
  • or ORDER BY NULL if you want a non-deterministic order - however, this typically seems to be implemented with the same effect as ORDER BY ROWNUM.