Provisional.Modulation Provisional.Modulation - 1 month ago 7
R Question

R - Input, Process, and Output pairs of data

We have a text file that contains drug administration data. Each line contains a patient ID, an administration date and a drug name, formatted as follows:


Using an input file of this format, produce a list of pairs of drugs that were administered together (i.e. administered to the same patient on the same day) at least twenty-five different times. In the above sample, adderall and tylenol appear together twice, but every other pair appears only once. Output each qualifying pair as a comma-separated tuple, one per line.

Assuming that the
combination occurred 50 times and
combination occurred 10 times, the output file should look something like this:
drug_used frequency
adderall-tylenol 50

Note that because the
combination occurred less than 25 times, it's not included on the final output.

dww dww

Using library(data.table) we can do

dt[, paste(drug, collapse = '-'), by = .(id,date)]
#      id       date               V1
# 1: A234 2014-01-01              5FU
# 2: A234 2014-01-02 adderall-tylenol
# 3: B324 1990-06-01 adderall-tylenol

Although this also includes id-date combinations where the drug combination is not a tuple. If you want to only have the lines which have exactly two drugs, then we add a test for this:

dt[, if (.N == 2) paste(drug, collapse = '-'), by = .(id,date)]
#      id       date               V1
# 1: A234 2014-01-02 adderall-tylenol
# 2: B324 1990-06-01 adderall-tylenol

To further subset these results to only those patients where a drug combination was applied more than 25 times on different days, we can chain the result to another test for this:

dt[, if (.N == 2) paste(drug, collapse = '-'), by = .(id,date)][, if (.N>25) .(date,V1), by=id]

If you need, you can write these results to a new file using write.table

The data

dt = fread("id, date, drug