ammosperros ammosperros -4 years ago 96
SQL Question

SAS : Make a crosswalk from values that change over time

I have a problem. I need to create a crosswalk for another table when an id changes itself in course of time. I need to have first year's id regardless of change in id.

data have;
input ORIG_ID $ CHANGE_ID $ YEAR
AAA BBB 1990
BBB AAA 1991
PPP ZZZ 1993
ZZZ YYY 1994
YYY ZZZ 1996
TTT MMM 1990
;

**What I want :**

/****OUTPUT****/

CHANGE_ID ORIG_ID
BBB AAA
ZZZ PPP
YYY PPP
MMM TTT
/*My logic so far*/
proc sql;
create table temp as
select CHANGE_ID, ORIG_ID
case
when (CHANGE_ID<ORIG_ID) then cat(CHANGE_ID,ORIG_ID)
when (ORIG_ID<CHANGE_ID) then cat(ORIG_ID,CHANGE_ID)
end as key, year
from dat
order by key,year;
quit;

data final;
retain CHANGE_ID ORIG_ID
set temp;
by key;
if first.key;
run;
/*But this works for id changing for AAA to BBB, may be not*/


Let me know if you have any confusions :

Answer Source

Here is a hash object approach - it should work provided that

  • You have enough memory to hold the whole input table
  • You only have 1 change per ID per year, or
  • Some way of ordering changes if there are multiple changes per ID per year


data have;
input ORIG_ID $ CHANGE_ID $ YEAR;
cards;
AAA  BBB  1990
BBB  AAA  1991
PPP  ZZZ  1993
ZZZ  YYY  1994
YYY  ZZZ  1996
TTT  MMM  1990
;
run;

proc sort data = have;
by CHANGE_ID descending YEAR;
run;

data v_have /view = v_have;
    set have;
    by CHANGE_ID;
    if first.CHANGE_ID then GRP = 0;
    GRP + 1;    
run;


data want;
  informat CHANGE_ID ORIG_ID;
  set v_have;
  /*Although we need the whole table in the hash, we only need to process each CHANGE_ID once*/
  by CHANGE_ID;
  if first.CHANGE_ID;

  /*Create a hash object to hold the table*/
  if _n_ = 1 then do;
    dcl hash h(dataset:'v_have');
    rc = h.definekey('CHANGE_ID','GRP');
    rc = h.definedata('ORIG_ID','YEAR');
    rc = h.definedone();
  end;

  /*Make a temp copy of the starting ID*/
  T_CHANGE_ID = CHANGE_ID;
  PREV_YEAR = 2000;
  rc=0;

  /*Follow chains of IDs backwards through the table, making sure we only step backwards in time to avoid looping*/
  do while(rc = 0 and YEAR < PREV_YEAR);
    PREV_YEAR = YEAR;
    CHANGE_ID = ORIG_ID;
    /*If there are multiple records per CHANGE_ID, take the most recent one that's younger than the current record*/
    do GRP = 1 by 1 while(rc=0 and YEAR >= PREV_YEAR);
        rc = h.find();
    end;
  end;

  /*Output the last CHANGE_ID we got to plus the starting ID */
  ORIG_ID   = CHANGE_ID;  
  CHANGE_ID = T_CHANGE_ID;

  /*Ignore trivial rows resulting from cycles*/
  if CHANGE_ID ne ORIG_ID;
  keep CHANGE_ID ORIG_ID;
run;

Now a bit more complex but working properly. I might have a go at doing this using a multidata hash object as I think it's possible to eliminate the initial sort that way.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download