Edu Edu - 2 months ago 6
R Question

R Extract duplicate words in string

I have strings

a
and
b
that compose my
data
. My purpose is to obtain a new variable that contains repeated words.

a = c("the red house av", "the blue sky", "the green grass")
b = c("the house built", " the sky of the city", "the grass in the garden")

data = data.frame(a, b)


Based on this answer I can get the logical of those that are repeated with
duplicated()


data = data%>% mutate(c = paste(a,b, sep = " "),
d = vapply(lapply(strsplit(c, " "), duplicated), paste, character(1L), collapse = " "))


Yet I am not able to obtain the words. My desired data should be something like this

> data.1
a b d
1 the red house av the house built the house
2 the blue sky the sky of the city the sky
3 the green grass the grass in the garden the grass


Any help on the function above would be highly appreciated.

Answer
a = c("the red house av", "the blue sky", "the green grass")
b = c("the house built", " the sky of the city", "the grass in the garden")

data <-  data.frame(a, b, stringsAsFactors = FALSE)

func <- function(dta) {
    words <- intersect( unlist(strsplit(dta$a, " ")), unlist(strsplit(dta$b, " ")) )
    dta$c <- paste(words, collapse = " ")
    return( as.data.frame(dta, stringsAsFactors = FALSE) )
}

library(dplyr)
data %>% rowwise() %>% do( func(.) )

Result:

#Source: local data frame [3 x 3]
#Groups: <by row>
#
## A tibble: 3 x 3
#                 a                       b         c
#*            <chr>                   <chr>     <chr>
#1 the red house av         the house built the house
#2     the blue sky     the sky of the city   the sky
#3  the green grass the grass in the garden the grass