Kalinkin Alexey Kalinkin Alexey - 19 days ago 9
R Question

R: regex for math expression

Good day to everyone!

I have a math expression, for example,

((2-x+3)^2+(x-5+7)^10)^0.5
. The key is i need to replace the
^
symbol to
pow
function of C language. I think that regex is what I need, but I don't know a regex like a pro. So I ended up with this regex:
(\([^()]*)*(\s*\([^()]*\)\s*)+([^()]*\))*
and don't know how to improve that sentence. Can you advice me something to solve that problem?

The expected output will be
pow(pow(2-x+3,2)+pow(x-5+7,10),0.5)
for that exampe.

Many thanks.

Answer

Since a PCRE regex can match nested parentheses, it is possible to achieve in R with a mere regex in a while loop checking the presence of ^ in the modified string with grepl("^", v, fixed=TRUE). Once there is no ^, there is nothing else to substitute.

The regex pattern is

(\(((?:[^()]++|(?1))*)\))\^(\d*\.?\d+)

See the regex demo

Details:

  • (\(((?:[^()]++|(?1))*)\)) - Group 1: a (...) substring with balanced parentheses capturing what is inside the outer parentheses into Group 2 (with ((?:[^()]++|(?1))*) subpattern) (explanation can be found at How can I match nested brackets using regex?), in short, \ matches an outer (, then (?:[^()]++|(?1))* matches zero or more sequences of 1+ chars other than ( and ) or the whole Group 1 subpattern ((?1) is a subroutine call) and then a ))
  • \^ - a ^ caret
  • (\d*\.?\d+) - Group 3: an int/float number (.5, 1.5, 345)

The replacement pattern contains a literal pow() and the \\2 and \\3 are backreferences to the substrings captured with Group 2 and 3.

R code:

v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
x <- grepl("^", v, fixed=TRUE)
while(x) {
    v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
    x = grepl("^", v, fixed=TRUE)
}
v
## => [1] "pow(pow(2-x+3, 2)+pow(x-5+7, 10), 0.5)"
Comments