Khalil Ardat Khalil Ardat - 25 days ago 14
R Question

Merge two tables on two columns without order of variables playing a role

I have two tables with different row numbers. I would like to merge the tables based on the content of two columns. However, the catch is I don't want the order of the variables to matter when merging. Example:

Table1:



Gene1 Gene2 p-value
TP53 ARID1A 0.001
ATM ATR 0.0005


Table2:



Gene1 Gene2 p-value
ARID1A TP53 0.0007
ATM ATR 0.004


I tried:

merge(Table1, Table2, by = c("Gene1", "Gene2"), all.x = TRUE)


But the problem is that it will only merge 'ATM' and 'ATR' but not 'TP53' and 'ARID1A' because they are not in the same order.

Is there a way to merge the two tables irrespective of the column order?

Answer

Using sqldf:

library(sqldf)

sqldf("
SELECT df1.*, 
       df2.`p.value` 
FROM   df1, df2 
WHERE (df1.Gene1 = df2.Gene1 AND
       df1.Gene2 = df2.Gene2) OR
      (df1.Gene1 = df2.Gene2 AND
       df1.Gene2 = df2.Gene1)")

#   Gene1  Gene2 p.value p.value
# 1  TP53 ARID1A   1e-03   7e-04
# 2   ATM    ATR   5e-04   4e-03
Comments