Daniel Daniel - 2 months ago 9
R Question

Combine the first two letters of each word in a sentence string and a numeric variable

I have a data frame with 8 variables and I need to create a new column that represents a combination of two columns for use as an ID for each observation. The two columns that I need to combine look like this:

Aut <- c("Robert Lucas", "Finn Kydland & Edward Prescott", "Alan Blinder & Ben Bernanke",
"Lars Svensson & Lawrence Christiano & Robert Lucas", "Ben Bernanke")
Year <- c(1976, 1989, 1983, 1985, 1983)
df <- data.frame(Aut, Year)


The resulting ID variable I expect is:

Aut Year ID
Robert Lucas 1976 RoLu1976
Finn Kydland & Edward Prescott 1989 FiKyEdPr1989
Lars Svensson & Lawrence Christiano 1983 LaSvLaChRoLu1983
& Robert Lucas
Alan Blinder & Ben Bernanke 1985 AlBlBeBe1985
Ben Bernanke 1983 BeBe1983

Answer

You can try:

library(stringr)
# first split the individual names using "&" as pattern.
a <- str_split(df$Aut, "&")
# Then use lapply, split and sub to split first and last name. Then paste the 
# first two letters of each name together. 
a1 <- lapply(a, function(x){
  x1 <- str_split(str_trim(x), " ")
  paste0(unlist(lapply(x1, str_sub,1,2)), collapse="")
})
# Finally add the years. Resulting vector can be saved in df. 
df$ID <- paste0(unlist(a1), df$Year)

And everything together in one function:

foo <- function(a, b){
   a <- str_split(a, "&")
   a1 <- lapply(a, function(x){
           x1 <- str_split(str_trim(x), " ")
           paste0(unlist(lapply(x1, str_sub, 1, 2)), collapse="")
          })
   paste0(unlist(a1), b) 
}

foo(df$Aut, df$Year)
[1] "RoLu1976"         "FiKyEdPr1989"     "AlBlBeBe1983"     "LaSvLaChRoLu1985" "BeBe1983"