HNSKD HNSKD - 1 month ago 6
R Question

How to use regex for this expression (e.g "6.81E+10")?

I have a vector of strings and I just want to extract those values that take the form


  1. "[digit][.][digit][digit][E][+][digit][digit]" or

  2. "[digit][.][digit][digit][E][+][digit][digit][digit]"



An example would be:


  1. "6.81E+10" and

  2. "5.01E+110"



Let the vector
a
be as follows:

a<-c("1.23E+110",
"1.77E+12",
"11.22E+110",
"1.222E+110",
"1.22E+1",
"1.22E+1888",
"1..72E+18",
"1.23EE+18",
"1.27E++18",
"1.27E+E+18",
"1.27R+180")


My command is:

grep("^[[:digit:]]{1}[.]{1}[[:digit:]]{2}[E+]{1}[[:digit:]]{2,3}",a,value=TRUE)


I would like it to return:

[1] "1.23E+110" "1.77E+12"


But instead it returns:
character(0)


Why can't it work?

Answer

Your issue arises because of hte [E+] line. the + operator is used for "1 or more", so you are telling it to look for one or more "E"s, and therefore the "+" does not get matched.

To match the "+" character, you need to escape it, either with \\+ or using a string literal [+]

The immediate fix to your suggested solution is

grep("^[[:digit:]]{1}[.]{1}[[:digit:]]{2}[E][+][[:digit:]]{2,3}$",a, value = T)
# [1] "1.23E+110" "1.77E+12"

But, as others have suggested (in particular @thelatemail), a neater approach is

grep("^\\d[.]\\d{2}E[+]\\d{2,3}$", a, value=TRUE)
Comments