Programmer Man Programmer Man - 2 months ago 15
R Question

Split a column of a table to multiple columns given the input of separators

Maybe what I want to do it's impossible, so I made this question to know if there is a way or not.
After reading this question here in stackoverflow I saw that there is a way to split a column in different columns, but it's not what I wanted, I have an app in shiny where I can have tables with values like:

Phones price
Nokia 1234D - J298 6732 - LM 2 103$
Samsung 3342L - J2YY 4372 - YU 3 130$
Samsung 3042X - IKAA 3221 - GN 4 102$


So the user comes and says I want to divide those values in column Phones as I want, so the idea that came to my mind was to make the user write something like (" ", " - ", " ", " - ") because I mean separate nokia, 1234D, J298, 6732, LM 2 in 5 columns given the mentioned separators.

Here is the example code:

library(stringr)
c=c(" "," - "," "," - ")
mytable <-data.table(Phones=c("Nokia 1234D - J298 6732 - LM 2",
"Samsung 3342L - J2YY 4372 - YU 3",
"Samsung 3042X - IKAA 3221 - GN 4"),price= c("103$", "130$", "102$") )
aux = str_split_fixed(mytable$Phones, c, 5)
mytable<-data.table( aux, mytable$price)


But I get the following result which is not what I want it separates as it wants , duplicates the first row.:

V1 V2 V3 V4 V5 V2
1: Nokia 1234D - J298 6732 - LM 2 103$
2: Samsung 3342L J2YY 4372 YU 3 130$
3: Samsung 3042X - IKAA 3221 - GN 4 102$
4: Nokia 1234D J298 6732 LM 2 103$


If you have a better solution it would be really helpful.

Answer Source

We could separate the 'Phones' column into 5 columns with extra= "merge" to keep the last column with strings "LM 2", "YU 3" etc

library(tidyr)
library(dplyr)
mytable %>% 
  separate(Phones, into = paste0("V", 1:5), remove = FALSE, extra = "merge")
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

If we need a custom split, then use extract

mytable %>%
   extract(Phones, into = paste0("V", 1:4), remove = FALSE, 
     "^(\\w+\\s+\\w+)\\s*-\\s*(\\w+)\\s+(\\w+)\\s*-\\s*(\\w+\\s+\\w+)")
#                             Phones            V1   V2   V3   V4 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

The ^ implies the start of the string followed by word (\\w+) followed by one or more spaces (\\s+) and the next word (\\w+) which we capture as a group ((...)) similarly the characters are matched based on the word, space characters

NOTE: The first approach gives the expected output as described in the post and the second one from the one requested in the comments


If we need to split by the custom splits in "c"

library(stringr)
c <- c(" "," - "," "," - ") #it is better to avoid function names for object names
fsplit <- function(str1, splt) {
       lst <- str_split(str1, splt, n = 2)
       v1 <- sapply(lst, `[`, 1)
       v2 <- sapply(lst, `[`, 2)
       list(v1, v2)
    }  

mytable[, V5 := Phones]
nm1 <- paste0("V", seq_along(c))
for(i in seq_along(c)){
  tmp <- fsplit(mytable$V5, c[i])
  mytable[, (nm1[i]) := tmp[[1]]]
  mytable[, V5 := tmp[[2]]][]
}
setcolorder(mytable,  c("Phones", nm1, "V5", "price"))
mytable
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$