Leo Leo - 2 months ago 9
R Question

Match Observation from One Table to Another Table Variable Consisting of String in R 3.4.1

I have two datasets called A and B.

library(data.table)
Farm.Type <- c("Fruits","Vegetables","Livestock")
Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

Store <- c("Convenience","Wholesale","Grocery","Market")
Produce <- c("Oranges","Watermelon","Cabbage","Pig")
Farm <- c("Fruits","","Vegetables","Livestock")

A <- data.table(Farm.Type, Produce.All)
B <- data.table(Store, Produce)


I am trying to identify what Farm.Type the Produce in table B falls into in table A, without changing the format of the two tables in order to pull the Farm.Type field into table B. Such that the data frame looks like

C <- data.table(Store, Produce, Farm)


I have tried using %in% in the following way:

B$Farm[B$Produce %in% A$Produce.All] <- A$Farm.Type


but because the A$Produce.All field is a string with commas, It does not match.

Is there a way to search through the string (A$Produce.All) to find the match for B$Produce?

Any help is appreciated.

Thanks.

Answer
Farm.Type <- c("Fruits","Vegetables","Livestock")
Produce.All <- c("Apple, Oranges, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

Store <- c("Convenience","Wholesale","Grocery","Market")
Produce <- c("Orange","Watermelon","Cabbage","Pig")
Farm <- c("Fruits","","Vegetables","Livestock")

There's no need for data.table here, so I'm foregoing the use of that. You'd be far better off transforming the data since you have to do gyrations like this:

library(dplyr)
library(purrr)
library(stringi)

A <- data_frame(Farm.Type, Produce.All)
B <- data_frame(Store, Produce)

map(B$Produce, ~stri_detect_regex(A$Produce.All, sprintf("[[:space:],]*%s[[:space:],]*", .))) %>% 
  map(which) %>% 
  map_chr(~A$Farm.Type[ifelse(length(.)==0, NA, .)][1]) 

otherwise. (You still have to add that to the B data frame)

versus:

library(purrr)
library(dplyr)
library(tidyr)

mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>% 
  unnest(Produce.All) -> A_long

left_join(B, A_long, by=c("Produce"="Produce.All"))

And, I sure hope this wasn't homework.

Comments