Kalinkin Alexey - 5 months ago 31

R Question

Good day to everyone!

I have a math expression, for example,

`((2-x+3)^2+(x-5+7)^10)^0.5`

`^`

`pow`

`(\([^()]*)*(\s*\([^()]*\)\s*)+([^()]*\))*`

The expected output will be

`pow(pow(2-x+3,2)+pow(x-5+7,10),0.5)`

Many thanks.

Answer

Since a PCRE regex *can* match nested parentheses, it is possible to achieve in R with a mere regex in a `while`

loop checking the presence of `^`

in the modified string with `grepl("^", v, fixed=TRUE)`

. Once there is no `^`

, there is nothing else to substitute.

The regex pattern is

```
(\(((?:[^()]++|(?1))*)\))\^(\d*\.?\d+)
```

See the regex demo

**Details**:

`(\(((?:[^()]++|(?1))*)\))`

- Group 1: a`(...)`

substring with balanced parentheses capturing what is inside the outer parentheses into Group 2 (with`((?:[^()]++|(?1))*)`

subpattern) (explanation can be found at How can I match nested brackets using regex?), in short,`\`

matches an outer`(`

, then`(?:[^()]++|(?1))*`

matches zero or more sequences of 1+ chars other than`(`

and`)`

or the whole Group 1 subpattern (`(?1)`

is a subroutine call) and then a`)`

)`\^`

- a`^`

caret`(\d*\.?\d+)`

- Group 3: an int/float number (`.5`

,`1.5`

,`345`

)

The replacement pattern contains a literal `pow()`

and the `\\2`

and `\\3`

are backreferences to the substrings *captured* with Group 2 and 3.

```
v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
x <- grepl("^", v, fixed=TRUE)
while(x) {
v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
x = grepl("^", v, fixed=TRUE)
}
v
## => [1] "pow(pow(2-x+3, 2)+pow(x-5+7, 10), 0.5)"
```