reuss - 1 year ago 128
R Question

# R: find most frequent combinations within same id

I have an problem in counting the number of combinations of drugs. My data is organized in two data frames.
df1 contains id and found drugs, as such:

``````ID | drug
-----------
1  | drug1
1  | drug2
1  | drug3
2  | drug3
2  | drug5
3  | drug1
3  | drug3
3  | drug4
3  | drug5
``````

df2 shows all possible drug combination of 2 different drugs, as such:

``````combi1 | combi2
-----------------
drug1  | drug2
drug1  | drug3
drug1  | drug4
drug2  | drug3
drug2  | drug4
drug2  | drug5
``````

With 7140 possible combinations in total. What I want is to find out how many IDs have combination
`drug1-drug2`
,
`drug1-drug3`
and so forth.

I have been trying a double
`for`
loop:

``````counter=0
for(com in 1:nrow(df2)){
for(id in 1:unique(df1\$ID)){
if(df2\$combi1[com] %in% df1\$drug[id] & df2\$combi2[com] %in% df1\$drug[id])   {
counter=counter+1
}
}
df2\$count=counter
counter=0
}
``````

But it doesn't work, because it is only able to check one row at a time. I have also tried the solution in Find Most Frequent Combination within a Vector by Group, but without any luck.

Furthermore, I need to do the same with combinations of three drugs

EDIT:
I like the output to be like this in df2, where I can see, how many times drug1 and drug2 has occurred as combination within ID. For example, only one ID had both drug1 and drug2, whereas 2 IDs had drug1 and drug3

``````combi1 | combi2 | count
-----------------------
drug1  | drug2  |   1
drug1  | drug3  |   2
drug1  | drug4  |   0
drug2  | drug3  |   1
drug2  | drug4  |   0
drug2  | drug5  |   0
``````

For this one, I reached for data.table, but you could use `tidyr` just as easily.

``````library(data.table)
set.seed(213) # set seed
d <- data.table(ID = rep(1:3, each = 3), drug = paste0("drug", sample(1:5, 9, rep = T)))

get_combs <- function(x){
uniq_x <- unique(x)
if(length(uniq_x) == 1L){
return(NULL)
} else {
return(as.data.frame(t(combn(uniq_x, 2)), stringsAsFactors = FALSE))
}

}

combi <- d[, get_combs(drug), by = ID][order(V1, V2),]
combi[ , .N, by = .(V1, V2)]

V1    V2 N
1: drug1 drug2 2
2: drug1 drug4 2
3: drug2 drug4 2
4: drug3 drug5 1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download