HNSKD - 1 month ago 6
R Question

# How to use regex for this expression (e.g "6.81E+10")?

I have a vector of strings and I just want to extract those values that take the form

1. "[digit][.][digit][digit][E][+][digit][digit]" or

2. "[digit][.][digit][digit][E][+][digit][digit][digit]"

An example would be:

1. "6.81E+10" and

2. "5.01E+110"

Let the vector
`a`
be as follows:

``````a<-c("1.23E+110",
"1.77E+12",
"11.22E+110",
"1.222E+110",
"1.22E+1",
"1.22E+1888",
"1..72E+18",
"1.23EE+18",
"1.27E++18",
"1.27E+E+18",
"1.27R+180")
``````

My command is:

`grep("^[[:digit:]]{1}[.]{1}[[:digit:]]{2}[E+]{1}[[:digit:]]{2,3}",a,value=TRUE)`

I would like it to return:

`[1] "1.23E+110" "1.77E+12"`

`character(0)`

Why can't it work?

Your issue arises because of hte `[E+]` line. the `+` operator is used for "1 or more", so you are telling it to look for one or more "E"s, and therefore the "+" does not get matched.

To match the "+" character, you need to escape it, either with `\\+` or using a string literal `[+]`

The immediate fix to your suggested solution is

``````grep("^[[:digit:]]{1}[.]{1}[[:digit:]]{2}[E][+][[:digit:]]{2,3}\$",a, value = T)
# [1] "1.23E+110" "1.77E+12"
``````

But, as others have suggested (in particular @thelatemail), a neater approach is

``````grep("^\\d[.]\\d{2}E[+]\\d{2,3}\$", a, value=TRUE)
``````