albit paoli albit paoli - 14 days ago 4
R Question

Filter rows based on variables "beginning with" strings specified by vector

I'm trying to filter a patient database based on specific ICD9 (diagnosis) codes. I would like to use a vector indicating the first 3 strings of the ICD9 codes.

The example database contains 3 character variables for IC9 codes for each patient visit (var1 to var3).

Below is an example of the data

patient<-c("a","b","c")
var1<-c("8661", "865","8651")
var2<-c("8651","8674","2866")
var3<-c("2430","3456","9089")

observations<-data_frame(patient,var1,var2,var3)

patient var1 var2 var3
1 a 8661 8651 2430
2 b 865 8674 3456
3 c 8651 2866 9089

#diagnosis of interest: all beginning with "866" and "867"
dx<-c("866","867")

filtered_data<- filter(observations, var1 %like% dx | var2 %like% dx | var3 %like% dx)


I have tried several approaches including the grep and the %like% functions as you can see above but I haven’t been able to get it working for my case. I would appreciate any help you can provide.

Happy thanksgivings

Albit

Answer

This looks close to what you're looking for, but requires a bit more manipulation:

library(dplyr)
library(stringr)
library(tidyr)

obs2 <- observations %>%
  gather(vars, value, -patient) %>%
  filter(str_sub(value, 1, 3) %in% dx)

# A tibble: 2 × 3
  patient  vars value
    <chr> <chr> <chr>
1       a  var1  8661
2       b  var2  8674