user7002207 user7002207 - 1 month ago 7
SQL Question

Selecting a record based on a series of criteria

I would like to run a query that will allow me to chose the best record from a particular username based on certain criteria. I have 2 columns (col01, col02) that are my criteria that I am looking at.

• If one record (username a in the example below) has both columns as yes, I would like that one to take precedence.

• If one record has col01 as a yes, that takes next 2nd rank precenence (username c in the example below)

• If one record has col01, and the other has col02 as yes, than col01 takes precedence(username d in the example below).

• If one record has col02 as yes, and the other records as no, than column two takes 3rd precedence (username g in the example below).

• If both records are the same, than neither should be returned as these records need to be investigated further (usernames b, e, f)

Below is example sample and output. How it can be done using sql query?

+----------+-----+-------+-------+
| username | id | col01 | col02 |
+----------+-----+-------+-------+
| a | 1 | yes | yes |
| a | 2 | yes | no |
| b | 3 | no | no |
| b | 4 | no | no |
| c | 5 | yes | no |
| c | 6 | no | no |
| d | 7 | yes | no |
| d | 8 | no | yes |
| e | 9 | no | yes |
| e | 10 | no | yes |
| f | 11 | yes | yes |
| f | 12 | yes | yes |
| g | 13 | no | no |
| g | 14 | no | yes |
+----------+----+--------+-------+


output

+----------+-----+-------+------+
| username | id | col01 | col02|
+----------+-----+-------+------+
| a | 1 | yes | yes |
| c | 5 | yes | no |
| d | 7 | yes | no |
| g | 14 | no | yes |
+----------+----+--------+------+


Edit: I was asked to explain the conditions. Basically the records come from the same area (username); The col01 is the most recently updated information we have, while col02 is older. Both columns are important to us, so that is why it is better if both are yes; col01 being more recent is where the more dependable data is. Where all the records are exactly the same, we have to dig a little deeper to understand out data.

Answer

Use multiple outer self joins, one for records with both yes, one for records with only col01 = yes and one for records with only col02 = yes. Then add predicates to only select records where the id is the id of the first record in that set (id of row with same name that has both yes, id of row with same name that has only col01 = yes, etc.)
to get rid of rows that are dupes, filter out any row where there's another row, (with different id) that has same value for username, col01, and col02.

Select distinct a.username, a.id,
    a.col01, a.col02
From table a
   left join table b   -- <- this is rows with both cols = yes
       on b.username=a.username 
          and b.col01='yes'
          and b.col02='yes' 
   left join table c1  -- <- this is rows with col1 = yes
       on c1.username=a.username 
          and c1.col01='yes'
          and c1.col02='no' 
   left join table c2  -- <- this is rows with col2 = yes         
       on c2.username=a.username 
          and c2.col01='no' 
          and c2.col02='yes'
Where a.id = coalesce(b.id, c1.Id, c2.Id)
   and not exists  -- <- This gets rid of f
      (select * from table 
       where username = a.username
          and id != a.id
          and col01 = a.col01
          and col02 = a.col02)

if col02 is in another table, then in each place you use the table and need col02, you will need to add another join to this other table.

Select distinct a.username, a.id,
    a.col01, ot.col02
From (table a join other table ot 
          on ot.id = a.Id)
   left join (table b join otherTable ob -- <- this rows with both cols yes
                  on ob.id= b.id)
       on b.username=a.username 
          and b.col01='yes'
          and ob.col02='yes' 
   left join (table c1 join otherTable oc1  -- <- this rows with col1 yes
                  on oc1.id= c1.id)
       on c1.username=a.username 
          and c1.col01='yes'
          and oc1.col02='no' 
   left join (table c2 join otherTable oc2 -- <- this rows with col2 yes
                  on oc2.id= c2.id)         
       on c2.username=a.username 
          and c2.col01='no' 
          and oc2.col02='yes'
Where a.id = coalesce(b.id, c1.Id, c2.Id)
   and not exists  -- <- This gets rid of f
      (select * from table e
           join otherTable oe
              on oe.id= e.id
       where e.username = a.username
          and e.id != a.id
          and e.col01 = a.col01
          and oe.col02 = a.col02)
Comments