Tac_For Tac_For - 2 months ago 8
R Question

Extract all words from a string and create a column with the result

I have a data frame (data3) with a Column named "Collector". In this column i have alpha numeric characters. For example: "Ruiz and Galvis 650". I need to extract the alpha characters and the numeric characters separately, and create two new columns, one with the numbers of that string (ColID) and another one with all the words (Col):

INPUT:

Collector Times Sample
Ruiz and Galvis 650 9 SP.1
Smith et al 469 8 SP.1


EXPECTED OUTPUT

Collector Times Sample ColID Col
Ruiz and Galvis 650 9 SP.1 650 Ruiz and Galvis
Smith et al 469 8 SP.1 469 Smith et al


I have tried the following but when I try to save the file I get an error (Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'):

regexp <- "[[:digit:]]+"
data3$colID<- NA
data3$colID <- str_extract (data3$Collector, regexp)

data3$Col<- NA
regexp <-"[[:alpha:]]+"
data3$Col <- (str_extract_all (data3$Collector, regexp))
write.table(data3, file = paste("borrar2",".csv", sep=""), quote=T, sep = ",", row.names = F)

Answer

The problem is that str_extract_all doesn't find just a single string, but a list of multiple. For example:

> dput(str_extract_all("Ruiz and Galvis 650", "[[:alpha:]]+"))
list(c("Ruiz", "and", "Galvis"))

A data frame with nested elements (as above) apparently cannot be saved to a file.

If, however, you update the regex pattern to match spaces as well as letters, you can go back to using str_extract instead:

> dput(str_extract("Ruiz and Galvis 650", "[[:alpha:] ]+"))
"Ruiz and Galvis "

Note the space in the second regex. This matches all the letters/spaces as one string and will allow you write the data.frame to a file.

Comments