Y Ming Y Ming - 4 months ago 8
SQL Question

SQL: Discard rows with conflicting records in one column

To better illustrate my problem, I came up with a sample scenario shown below:

The following is a list of bank account history for customer A and B. The Open_Year for some of the accounts are either missing or conflicting with other records.

For example, account A-3-UB represents a single account with two records in the list, however, it has two conflicting Open_Year 1990 & 2012; account A-1-BA has two records, where one of the records is missing the Open_Year.

Customer Account_id Bank_id Open_Year Gender
A 1 BA 2000 F
A 1 BA . F

A 2 UB . F

A 3 UB 1990 F
A 3 UB 2012 F

A 4 UB 2013 F
A 4 UB 2013 .

A 5 UB . F

B 1 WF 2014 M
B 1 WF 2014 .

B 6 WF . .


What I would like to have is through a single SELECT query, those accounts with either missing/conflicting Open_Year will be discarded completed, i.e., the return result will be:

Customer Account_id Bank_id Open_Year Gender
A 4 UB 2013 F
A 4 UB 2013 .
B 1 WF 2014 M
B 1 WF 2014 .


Additional Question:

Now we have another record been added at the end, and we'd also like to discard a row when Gender is conflicting or missing at least one value, same requirement as for Open_Year:

Customer Account_id Bank_id Open_Year Gender
A 1 BA 2000 F
A 1 BA . F

A 2 UB . F

A 3 UB 1990 F
A 3 UB 2012 F

A 4 UB 2013 F
A 4 UB 2013 .

A 5 UB . F

B 1 WF 2014 M
B 1 WF 2014 .

B 6 WF . .

C 7 WB 2015 F


The expected output would be:

Customer Account_id Bank_id Open_Year Gender
C 7 WB 2015 F

Answer

You can do this using window functions. Here is one method:

select t.*
from (select t.*,
             count(distinct open_year) over (partition by account_id, bank_id) as cntd_oy,
             count(*) over (partition by account_id, bank_id) as cnt,
             count(open_year) over (partition by account_id, bank_id) as cnt_oy
      from t
     ) t
where cntd_oy = 1 and cnt = cnt_oy
Comments