user971102 - 1 year ago 80
R Question

# Most efficient way to find common values for one column across many data frames

I have many files and I am trying to find the most efficient way of reading the data frames and finding common values in one column.

For now I have:
1. I read a list of files using:

``````files = c("test1.txt", "test2.txt", test3.txt")
``````

Each containing columns e.g.

``````df1 = data.frame(id=c("a", "b", "c"), v = c(1:3), c=c(10:12))
df2 = data.frame(id=c("x", "b", "c"), v = c(2:4), c=c(13:15))
df3 = data.frame(id=c("a", "n", "c"), v = c(4:6), c=c(16:18))

my.data = list(df1, df2, df3)
``````

And now I am trying to subset the list of data frames to return the same list of data frames each containing only the common rows for the first column called "id", e.g.

``````df1, df2, and df3 in this case would be a list containing only "id" common to all read files, i.e. a row with only "c" in this case:
intersect(intersect(df1\$id, df2\$id), df3\$id);
list(df1[3,], df2[3,], df3[3,])
``````

but I can't figure out a way using lists to merge all data frames, maybe this is a longer/more difficult process than reading all files, merging them all first by the common column "id", and then splitting them into a list of data frames? Does anybody have any insight for most efficient ways? Thank you!

To find the common intersection of the `id` columns, you can use

``````common <- Reduce(intersect, Map("[[", my.data, "id"))
``````

Then we can use that to subset the list elements.

``````lapply(my.data, function(x) x[x\$id %in% common, ])
# [[1]]
#   id v  c
# 3  c 3 12
#
# [[2]]
#   id v  c
# 3  c 4 15
#
# [[3]]
#   id v  c
# 3  c 6 18
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download