Work Work -3 years ago 118
R Question

Separating multiple value numbers (with characters) and text

I have a file in Excel that has, as an example, text such as this "4.56/505AB" in a cell. The numbers all vary, as does the length of text, so the text can be single or multiple characters, and the numbers can contain characters such as a decimal point or slash mark.

The ideal, separated format for this example would be: column 1 = 4.56/505, column 2 = AB.

What I've tried:
"Split_Text" in Excel, which removed the special characters from the number, and resulted in the following output: column 1 = 456505, column 2 = ./AB

R with the "G_sub" command, which resulted in: [1] " 4 . 56 / 505 AB"

Is there a way to take these methods further, or will this be a manual fix? Thank you!

Answer Source

Assuming the first uppercase letter is the beginning of the second column

df <- data.frame(c1 = c("4.56/505AB", "1.23/202CD"))

library(stringr)
df$c2 <- str_extract(df$c1, "[^[A-Z]]+")
df$c3 <- str_extract(df$c1, "[A-Z]+") 

df
#           c1       c2 c3
# 1 4.56/505AB 4.56/505 AB
# 2 1.23/202CD 1.23/202 CD
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download