albit paoli albit paoli - 9 months ago 57
R Question

Filter rows based on variables "beginning with" strings specified by vector

I'm trying to filter a patient database based on specific ICD9 (diagnosis) codes. I would like to use a vector indicating the first 3 strings of the ICD9 codes.

The example database contains 3 character variables for IC9 codes for each patient visit (var1 to var3).

Below is an example of the data

var1<-c("8661", "865","8651")


patient var1 var2 var3
1 a 8661 8651 2430
2 b 865 8674 3456
3 c 8651 2866 9089

#diagnosis of interest: all beginning with "866" and "867"

filtered_data<- filter(observations, var1 %like% dx | var2 %like% dx | var3 %like% dx)

I have tried several approaches including the grep and the %like% functions as you can see above but I haven’t been able to get it working for my case. I would appreciate any help you can provide.

Happy thanksgivings


Answer Source

This looks close to what you're looking for, but requires a bit more manipulation:


obs2 <- observations %>%
  gather(vars, value, -patient) %>%
  filter(str_sub(value, 1, 3) %in% dx)

# A tibble: 2 × 3
  patient  vars value
    <chr> <chr> <chr>
1       a  var1  8661
2       b  var2  8674