Clatty Cake Clatty Cake - 8 days ago 6
R Question

Split String In R Based On Character Location

I'm trying to split these strings in R (column entries) into three separate columns:

João Moutinho Monaco, 30, M(C)
Clinton N'Jie Marseille, 23, FW
Frederic Sammaritano Dijon, 30, AM(LR)


to become

Player Team Pos
João Moutinho Monaco 30, M(C)
Clinton N'Jie Marseille 23, FW
Frederic Sammaritano Dijon 30, AM(LR)


I can find the location of the characters using gregexpr and nchar, but but I'm not sure how to use strsplit for it. Or maybe another package is easier?

Answer

We can read the vectors in to a data.frame with read.csv after creating a delimiter using gsub

read.csv(text=gsub("^(\\S+\\s+\\S+)\\s+(\\S+),\\s+(.*)", 
       "\\1;\\2;\\3", v1), sep=";", header=FALSE, 
       col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
#                Player      Team         Pos
#1        João Moutinho    Monaco   30,  M(C)
#2        Clinton N'Jie Marseille     23,  FW
#3 Frederic Sammaritano     Dijon 30,  AM(LR)

Update

If we have more patterns and the "Team" names have only a single word (i.e. before the first ',')

read.csv(text= sub("(\\s+[A-Za-z]+),(\\s+\\d+),(.*)", ";\\1;\\2\\3", v2), 
      header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
#                Player       Team         Pos
#1        João Moutinho     Monaco    30  M(C)
#2        Clinton N'Jie  Marseille      23  FW
#3 Frederic Sammaritano      Dijon  30  AM(LR)
#4       Angel Di María        PSG   28 M(CLR)
#5    Jean Michael Seri       Nice     25 M(C)

data

v1 <- c("João Moutinho Monaco, 30,  M(C)", "Clinton N'Jie Marseille, 23,  FW", 
                    "Frederic Sammaritano Dijon, 30,  AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")
Comments