TonyGW TonyGW - 8 months ago 48
R Question

R: extracting numbers from string

I am trying to use the package

in R for extracting number(s) from strings. The pattern of the strings is:

1 nomination
2 wins
1 win & 3 nominations
2 wins & 1 nomination
won 1 Oscar. Another 5 wins & 2 nominations

I wish to extract the number(s) in each string. If there's only win or nomination, treat the only number as the win/nominations.

So far, I have tried the following:

test <- "6 wins & 3 nominations."

str_extract(test, regex="\\w*\\d\\w*")

However, this only gives the first number, not including the second number.

stri_extract(test, regex="\\w*\\d+wins(\\s*+&+\\s*)(\\d)")
gives NA.

The following way works, but feels too unwieldy by splitting the string first, following by stri_extract:

t <- strsplit(test, "&") # split the string first
win_num <- stri_extract(t[1], regex="\\d")
nomination_num <- stri_extract(t[2], regex="\\d") # if exists

Any way to make the regex way work in one line? Thanks!


For extracting multiple numbers, use str_extract_all which returns a list output.

str_extract_all(test, "\\d+")[[1]]