123apd 123apd - 7 months ago 28
R Question

How do i retrieve all numbers in a string and combine them into one number using regex?

This should be pretty easy, but the results after using suggestions from other SO posts leave me baffled. And, of course, I'd like to avoid using loops.

Reproducible example

input <- "<77Â 500 miles</dd>"
mynumbers <- str_extract_all(input, "[0-9]")

The variable mynumbers is a list of five charaters:

> mynumbers
[1] "7" "7" "5" "0" "0"

But this is what I'm after:

> mynumbers
[1] 77500

This post suggests using paste, and I guess this should work fine given the correct sep and collapse arguments, but I have got to be missing something essential here. I have also tried to use unlist. Here is what I've tried so far:

1 - using paste

> paste(mynumbers)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

2 - using paste

> paste(mynumbers, sep = " ")
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

3 - using paste

> paste (mynumbers, sep = " ", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

4 - using paste

> paste (mynumbers, sep = "", collapse = NULL)
[1] "c(\"7\", \"7\", \"5\", \"0\", \"0\")"

5 - using unlist

> as.numeric(unlist(mynumbers))
[1] 7 7 5 0 0

I'm hoping some of you have a few suggestions.
I guess there's an elegant solution using regex somehow, but I'm also very interested in the paste / unlist problem that is specific to R. Thanks!


The question was marked as possible duplicate of this post.
The suggested solutions there would certainly solve the problem, and I'm a bit embarresed to admit that I did not see that post despite numerous attempts of finding an existing solution on SO. However, my post also included specifics regarding the functionality of stringr::str_extract_all and base::paste, so the specific answers provided here were very useful, at least to me.


The str_extract_all returns a list. We need to convert to vector and then paste. To extract the list element we use [[ and as there is only a single element, mynumbers[[1]] will get the vector. Then, do the paste/collapse and as.numeric.

#[1] 77500

We can also match one or more non-numeric (\\D+), replace it with "" in gsub and convert to numeric.

as.numeric(gsub("\\D+", "", input))
#[1] 77500