Denis Denis - 2 months ago 20
R Question

Split string by last two characters in r

My data frame looks like:

b <- data.frame(height = c(190,165,174,176), name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31'))

# height name
# 1 190 John Smith 34
# 2 165 Mr.Turner 54
# 3 174 Antonio P. 23
# 4 176 John Brown 31


As we can see name and age are the same value. So I want to split it by last two characters in string:

height name age
1 190 John Smith 34
2 165 Mr.Turner 54
3 174 Antonio P. 23
4 176 John Brown 31


How I can do that?

Answer

tidyr::separate makes separating columns simple by allowing you to pass an integer index of split position, including negatively indexed from the end of the string. (Regex works as well, of course.)

library(tidyr)

b %>% separate(name, into = c('name', 'age'), sep = -4, convert = TRUE)
##   height        name age
## 1    190 John Smith   34
## 2    165  Mr.Turner   54
## 3    174 Antonio P.   23
## 4    176 John Brown   31

or separate by the final space:

b %>% separate(name, into = c('name', 'age'), sep = '\\s(?=\\S*?$)', convert = TRUE)

which returns the same thing.

In base R, it's a bit more work:

b$name <- as.character(b$name)
split_name <- strsplit(b$name, '\\s(?=\\S*?$)', perl = TRUE)
split_name <- do.call(rbind, split_name)
colnames(split_name) <- c('name', 'age')
b <- data.frame(b[-2], split_name, stringsAsFactors = FALSE)
b$age <- type.convert(b$age)

b
##   height       name age
## 1    190 John Smith  34
## 2    165  Mr.Turner  54
## 3    174 Antonio P.  23
## 4    176 John Brown  31