Nils Nils - 2 months ago 16
R Question

R filter table with mapping table to reduce NA values

I would like to filter a table bsp1

g <- factor(c("Company 1", "Company 2", "Company 3", "Company 4", "Company 5"))
w <- factor(c("a", "b", "c", "a", "c"))
x <- c(28, 18, 25, NA, 21)
y <- c(80, NA, 74, 101, NA)
z <- c(170, 174, 183, NA, 185)
bsp1 <- data.frame(g, w, x, y, z)
colnames(bsp1) <- c("Company", "Sector", "Item 1", "Item 2", "Item 3")
rm(w, x, y, z)
bsp1

# Company Sector Item 1 Item 2 Item 3
# 1 Company 1 a 28 80 170
# 2 Company 2 b 18 NA 174
# 3 Company 3 c 25 74 183
# 4 Company 4 a NA 101 NA
# 5 Company 5 c 21 NA 185


based on a mapping table bsp2

sector <- factor(c("a", "b", "c"))
a <- c(1, 1, 1)
b <- c(NA, 1, NA)
c <- c(NA, NA, 1)
bsp2 <- data.frame(sector, a, b, c)
colnames(bsp2) <- c("Sector", "Item 1", "Item 2", "Item 3")
bsp2

# Sector Item 1 Item 2 Item 3
# 1 a 1 NA NA
# 2 b 1 1 NA
# 3 c 1 NA 1


the filter rule should be:
For each sector in bsp2 for which the item is not NA, the rows in bsp1 from the respective sector with NA in the item should be removed.

The desired outcome looks therefore like bsp3

bsp3 <- bsp1[c(1, 3, 5),]
bsp3

# Company Sector Item 1 Item 2 Item 3
# 1 Company 1 a 28 80 170
# 3 Company 3 c 25 74 183
# 5 Company 5 c 21 NA 185


Company 2 was dropped, as item 2 is required for b. Line 5 was not dropped as item 2 is not required for sector 3.

I thought about an apply function for all sectors over bsp1 with a vector of the relevant columns derived with following function for the vector.

a <- !is.na(bsp2[1,])
a <- which(a==c("TRUE"))


Unfortunately in the filter test of one vector I have a problem that b is interpreted as a Matrix.

b <- is.na(bsp1[, a])
c <- which(b==c("TRUE"))
reduced2016b <- data2013[-c,]


Even if I set up the vector manually, the filter result is not how I expect it.
Would be great if anyone has an idea to solve this problem. Also partial steps would help me already.

Thank you in advance!

Answer

One way is to melt both data frames, merge by Sector and variable, and find which companies have NA for value.x and not NA for value.y, i.e.

library(reshape2)
new_df <- merge(melt(bsp1), melt(bsp2), by = c('Sector', 'variable'))
ind <- as.character(new_df$Company[is.na(new_df$value.x) & !is.na(new_df$value.y)])
bsp1[!bsp1$Company %in% ind,]

#    Company Sector Item 1 Item 2 Item 3
#1 Company 1      a     28     80    170
#3 Company 3      c     25     74    183
#5 Company 5      c     21     NA    185