Miha Miha - 1 month ago 10
R Question

Match multiple columns in two data frames and print unequal

So I have two large (in some part different) data frames, first is named sifrant and consist of:

IDP PREDMET ID NAME
1 120_1 ustanove _18 Barbara
2 120_1 ustanove _301 Mojca
3 200_1 zakonod _954 Klara


And other data frame is dalled baza

predmet sifpredm izvajalec1 sifizv1 izvajalec2 sifizv2 izvajalec3 sifizv3 izvajalec4 sifizv4 izvajalec5 sifizv5
ustanove 120_1 Barbara _18 Mojca _301 Klara _954 -2 -2 -2 -2
ustanove 120_1 Barbara _18 Mojca _301 -2 -2 -2 -2 -2 -2


Boath data frames should have equal values in next colums:
IDP==sifpredm
and
ID==sifizv1, siizv2, sifizv3, and so on
.

Now I would like to match this columns
IDP with sifpredm
and
ID with sifizv1 and sifizv2 and sifizv3
and so on.

Based on match, I would like to print
IDP, ID and NAME
that do not have match in sifrant (i.e., is missing in sifrant and is present in baza)

Comparing two data frames I can see that in data frame
sifrant
lecturer Klara is missing IDP 120_1. So my output should print which sifpredm, sifizv and izvajalec are in
baza
but not in
sifrant
. In my case sifrant is missing class with ID number 120_1, lecturer ID number _954, lecturer name Klara

So far I have tried

merge(sifrant, baza, by.x = "IDP",by.y="sifpredm" , all = T)


and

require(sqldf)
sifrantNotInbaza <- sqldf('SELECT * FROM baza EXCEPT SELECT * FROM sifrant')


and

library(diffobj)
install.packages("diffobj")
diffPrint(sifrant, baza)
diffObj(sifrant, baza)


I've also looked here

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

Here is also dput of both data frames:

structure(list(IDP = c("120_1", "120_1"), PREDMET = c("ustanove",
" ustanove"), ID = c("_18", "_301"), NAME = c("Barbara", "Mojca "
)), .Names = c("IDP", "PREDMET", "ID", "NAME"), class = "data.frame", row.names = c(NA,
-2L))


structure(list(predmet = c("ustanove", "ustanove"), sifpredm = c("120_1",
"120_1"), izvajalec1 = c("Barbara ", "Barbara "), sifizv1 = c("_18",
"_18"), izvajalec2 = c("Mojca ", "Mojca "), sifizv2 = c("_301",
"_301"), izvajalec3 = c("Klara ", "-2"), sifizv3 = c("_954",
"-2"), izvajalec4 = c(-2L, -2L), sifizv4 = c(-2L, -2L), izvajalec5 = c(-2L,
-2L), sifizv5 = c(-2L, -2L)), .Names = c("predmet", "sifpredm",
"izvajalec1", "sifizv1", "izvajalec2", "sifizv2", "izvajalec3",
"sifizv3", "izvajalec4", "sifizv4", "izvajalec5", "sifizv5"), class = "data.frame", row.names = c(NA,
-2L))

Answer

I think you need to correct your dput for sifrant. I copied in the table you've got in your description instead. Then I would do something like:

library(tidyverse)
baza2 <- baza %>% 
           gather(contains("sifizv"), 
                key = "sifizv", 
                value = "ID"
                ) %>% 
           select(1:2,ID) %>% 
           unique()
output <- anti_join(baza2, sifrant, by = c("sifpredm" = "IDP", "ID"))

This tells you the ID of the missing person. You can then look up their name by using match or:

left_join(output, sifrant) %>% select(sifpredm, predmet, ID, NAME)
Comments