reuss reuss - 1 month ago 9
R Question

R: find most frequent combinations within same id

I have an problem in counting the number of combinations of drugs. My data is organized in two data frames.
df1 contains id and found drugs, as such:

ID | drug
-----------
1 | drug1
1 | drug2
1 | drug3
2 | drug3
2 | drug5
3 | drug1
3 | drug3
3 | drug4
3 | drug5


df2 shows all possible drug combination of 2 different drugs, as such:

combi1 | combi2
-----------------
drug1 | drug2
drug1 | drug3
drug1 | drug4
drug2 | drug3
drug2 | drug4
drug2 | drug5


With 7140 possible combinations in total. What I want is to find out how many IDs have combination
drug1-drug2
,
drug1-drug3
and so forth.

I have been trying a double
for
loop:

counter=0
for(com in 1:nrow(df2)){
for(id in 1:unique(df1$ID)){
if(df2$combi1[com] %in% df1$drug[id] & df2$combi2[com] %in% df1$drug[id]) {
counter=counter+1
}
}
df2$count=counter
counter=0
}


But it doesn't work, because it is only able to check one row at a time. I have also tried the solution in Find Most Frequent Combination within a Vector by Group, but without any luck.

Furthermore, I need to do the same with combinations of three drugs

EDIT:
I like the output to be like this in df2, where I can see, how many times drug1 and drug2 has occurred as combination within ID. For example, only one ID had both drug1 and drug2, whereas 2 IDs had drug1 and drug3

combi1 | combi2 | count
-----------------------
drug1 | drug2 | 1
drug1 | drug3 | 2
drug1 | drug4 | 0
drug2 | drug3 | 1
drug2 | drug4 | 0
drug2 | drug5 | 0

Answer

For this one, I reached for data.table, but you could use tidyr just as easily.

library(data.table)
set.seed(213) # set seed
d <- data.table(ID = rep(1:3, each = 3), drug = paste0("drug", sample(1:5, 9, rep = T))) 

get_combs <- function(x){
  uniq_x <- unique(x)
  if(length(uniq_x) == 1L){
    return(NULL)
  } else {
    return(as.data.frame(t(combn(uniq_x, 2)), stringsAsFactors = FALSE))
  }

}

combi <- d[, get_combs(drug), by = ID][order(V1, V2),]
combi[ , .N, by = .(V1, V2)]

      V1    V2 N
1: drug1 drug2 2
2: drug1 drug4 2
3: drug2 drug4 2
4: drug3 drug5 1
Comments