user3067851 user3067851 - 4 months ago 34
R Question

Matching two list of unequal length

I am trying to match the values in 2 lists only where the variable names are the same between list. I would like the result to be a list the length of the longer list filled with count of total matches.

jac <- structure(list(s1 = "a", s2 = c("b", "c", "d"), s3 = 5),
.Names = c("s1", "s2", "s3"))

larger <- structure(list(s1 = structure(c(1L, 1L, 1L), .Label = "a", class = "factor"),
s2 = structure(c(2L, 1L, 3L), .Label = c("b", "c", "d"), class = "factor"),
s3 = c(1, 2, 7)), .Names = c("s1", "s2", "s3"), row.names = c(NA, -3L), class = "data.frame")


I am using
mapply(FUN = pmatch, jac, larger)
which gives me a correct total but not in the format that I would like below:

s1 s2 s3 s1result s2result s3result
a c 1 1 2 NA
a b 2 1 1 NA
a c 7 1 3 NA


However, I don't think pmatch will ensure the name matching in every situation so I wrote a function that I am still having issues with:

prodMatch <- function(jac,larger){
for(i in 1:nrow(larger)){
if(names(jac)[i] %in% names(larger[i])){
r[i] <- jac %in% larger[i]
r
}
}
}


Can anyone help out?

Another dataset that causes one to not be a multiple of the ohter:

larger2 <-
structure(list(s1 = structure(c(1L, 1L, 1L), class = "factor", .Label = "a"),
s2 = structure(c(1L, 1L, 1L), class = "factor", .Label = "c"),
s3 = c(1, 2, 7), s4 = c(8, 9, 10)), .Names = c("s1", "s2",
"s3", "s4"), row.names = c(NA, -3L), class = "data.frame")

Answer

mapply returns a list of matching index, you can convert it to a data frame simply using as.data.frame:

as.data.frame(mapply(match, jac, larger))
#   s1 s2 s3
# 1  1  2 NA
# 2  1  1 NA
# 3  1  3 NA

And cbind the result with larger gives what you expected:

cbind(larger, 
      setNames(as.data.frame(mapply(match, jac, larger)), 
               paste(names(jac), "result", sep = "")))

#  s1 s2 s3 s1result s2result s3result
#1  a  c  1        1        2       NA
#2  a  b  2        1        1       NA
#3  a  d  7        1        3       NA

Update: To take care of the cases where the name of the two lists don't match, we can loop through the larger and it's name simultaneously and extract the elements from jac as follows:

as.data.frame(
    mapply(function(col, name) { 
        m <- match(jac[[name]], col)
        if(length(m) == 0) NA else m  # if the name doesn't exist in jac return NA as well
        }, larger, names(larger)))

#  s1 s2 s3
#1  1  2 NA
#2  1  1 NA
#3  1  3 NA