DC2 DC2 - 2 months ago 7
SQL Question

Multiple Rows of Data to Single Row by Conditions

I have data that looks like this (the last column added):

ID Var 1 Date What I Want
aa11 Stage I 1980 Delete
aa11 Stage 2 1980 Keep
aa22 Stage 1 1980 Keep
aa22 Stage 2 1990 Delete
aa33 Stage 3 1992 Keep


But I want it to look like this:

ID Var 1 Date
aa11 Stage 2 1980
aa22 Stage 1 1980
aa33 Stage 3 1992


I want a single row of data per id on these conditions: 1. The entry with the earliest data is taken Else 2. If there are two entries in the same year, take the entry with higher stage (var 1) Else 3. Take the only entry given. How would you go about writing a piece of SQL code or SAS Data-step for this succinctly?

Answer

In SAS this is very simple with a data step. Just sort the data in the required order, then use first.id in a data step to extract the first id. I've assumed that 'Stage I' in your post is a typo and should say 'Stage 1'

/* create original data */
data have;
infile datalines dsd;
input ID $ Var_1 $ Date; 
datalines;
aa11,Stage 1,1980
aa11,Stage 2,1980
aa22,Stage 1,1980
aa22,Stage 2,1990
aa33,Stage 3,1992
;
run;

/* sort dataset */
proc sort data=have;
by id date descending var_1;
run;

/* extract first id only */
data want;
set have;
by id;
if first.id;
run;