nimz713 nimz713 - 1 year ago 44
R Question

Binary variable to multiple records of same key based on characters in another field in R

I have a table of doctor visits wherein there are sometimes multiple records for the same encounter key (

Enc_Key
) if there are multiple diagnoses, such as:

Enc_Key | Patient_Key | Enc_Date | Diag
123 789 20160512 asthma
123 789 20160512 fever
123 789 20160515 coughing
546 013 20160226 flu
564 444 20160707 laceration
789 226 20160707 asthma
789 226 20160707 fever


I am trying to create an indicator variable
Diag_Ind
based on the value of the character variable
Diag
, but I need to apply it for the entire encounter. In other words, if I get a value of "asthma" for
Diag
for a record, then I want to apply a "1" for the
Diag_Ind
to every record that has the same
Enc_Key
, such as below:

Enc_Key | Patient_Key | Enc_Date | Diag | Diag_Ind
123 789 20160512 asthma 1
123 789 20160512 fever 1
123 789 20160515 coughing 1
546 013 20160226 flu 0
564 444 20160707 laceration 0
789 226 20160707 asthma attack 1
789 226 20160707 fever 1


I can't seem to figure out a way to apply this binary indicator to multiple records. I have been using a line of code that resembles this:

tbl$Diag_Ind <- ifelse(grepl('asthma',tolower(tbl$Diag)),1,0)


but this would only assign a value of "1" to the single record with that Diag value, such as this:

Enc_Key | Patient_Key | Enc_Date | Diag | Diag_Ind
123 789 20160512 asthma 1
123 789 20160512 fever 0
123 789 20160515 coughing 0
546 013 20160226 flu 0
564 444 20160707 laceration 0
789 226 20160707 asthma attack 1
789 226 20160707 fever 0


I'm unsure of how to apply it to the rest of the records with the same
Enc_Key
value

Answer Source

We can use base R ave to check if any value in every group of Enc_Key has asthma in it

df$Diag_Ind<- ave(df$Diag, df$Enc_Key,FUN=function(x) as.integer(any(grep("asthma", x))))

df
#  Enc_Key Patient_Key Enc_Date       Diag Diag_Ind
#1     123         789 20160512     asthma        1
#2     123         789 20160512      fever        1
#3     123         789 20160515   coughing        1
#4     546          13 20160226        flu        0
#5     564         444 20160707 laceration        0
#6     789         226 20160707     asthma        1
#7     789         226 20160707      fever        1

Similar solution with dplyr

library(dplyr)
df %>%
 group_by(Enc_Key) %>%
 mutate(Diag_Ind = as.numeric(any(grep("asthma", Diag))))

#   Enc_Key Patient_Key Enc_Date    Diag    Diag_Ind
#    (int)       (int)    (int)     (fctr)    (dbl)
#1     123         789 20160512     asthma        1
#2     123         789 20160512      fever        1
#3     123         789 20160515   coughing        1
#4     546          13 20160226        flu        0
#5     564         444 20160707 laceration        0
#6     789         226 20160707     asthma        1
#7     789         226 20160707      fever        1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download