Anjeg Anjeg - 6 months ago 14
R Question

Compare a list against multiple vectors, break 'loop', and populate new column

Am new to R and have search for answers. Learned a lot in the last 2 weeks from finding answers I could modify. This time I'm really stuck.

I wish to populate a new variable, Abuse, depending on the values across 20+ columns. The values I look for are prioritized, such that I wish

  • to 'break' the search if a value is found,

  • populate Abuse with a string,

  • and restart the search with the next 'row'.

As a SAS programmer I've coded this with a do while loop - and am trying very hard to learn the advantage of vectors in R.

This is what I've tried - although the matrix is listed as only gaining 1 variable, when using HEAD I see 20+ new variables with the "ABUSE." prefix where only 1 new column, ABUSE, is desired. And rows are getting multiple, new strings where only 1 is desired.

There are 20+ diag_codes and have included only a few here.

diag_codes <- c("admitting_diagnosis", "princ_diag_code",
"oth_diag_code_2" )

non_fall2_flag <- read.table(header=TRUE, text=
"admitting_diagnosis princ_diag_code poa_princ_diag_code oth_diag_code_1 poa_oth_diag_code_1 oth_diag_code_2

27651 73026 Y 99559 Y 80703
99550 99550 Y 85220 Y 591
78609 486 Y 99559 Y 1320
78039 78609 Y 7707 Y 99550
78065 99559 Y 9916 Y 3379
99550 99554 Y 3158 Y 1330
9941 9941 Y 99559 Y 2760
78039 99559 Y 51889 Y V1505

non_fall2_flag$abuse<- sapply(non_fall2_flag[,diag_codes],
ifelse (x=='99559', abuse<-"other abuse",
ifelse (x=='99550', abuse<-"unspec.",
abuse<-'' )))

Thank you from this first-time poster.

42- 42-

You've got a few things to unlearn from your SAS days, but first here's a solution:

 non_fall2_flag$abuse <-  apply( non_fall2_flag[diag_codes], 1, 
       function(x) if('99559' %in% x) {"other abuse"} else 
                        if ('99550' %in% x) {"unspec."} else {""} )

The things to unlearn are that R does not have an implicit row-oriented looping mechanism in the manner of what you are familiar with in data steps. The second is that ifelse is designed to return vectors but you should not be using <- inside the consequent and alternate expressions. Instead you need to provide two vectors and the ifelse machinery will do the choosing. Any assignment should be outside the ifelse. If you had been working with a single column rather than wanting to test multiple columns at once, you could have used ifelse.

My code used %in% to apply the membership test across an entire row at a time. When apply is used with a second argument of 1, an entire row is passed to the formal argument of the function in the third position. Another approach to processing several columns at one might have been to use mapply, but then you would have needed to separately extract the columns and that would ahve been a lot more bulky code.