Alexey Ferapontov Alexey Ferapontov - 5 months ago 26
R Question

R sub with perl - starts search backwards?

I have strings that look like

shown below. I need to extract part of the string that is between first
and first subsequent
. I use
perl = F
but it's roughly 4 times slower than with
perl = T
. So I tried
perl = T
and found that search starts from the END of the string??

a = ""



is what I need. I am very surprised to see this - is it documented somewhere? How can I rewrite it with
- I have 20M rows to work with, and speed is important. Thanks!

Edit: it is not given that every string will start with


You can try .*?//(.*?)/.* to make the first .* lazy too so that // will match the first // instance:

# [1] ""

And ?gsub says:

The standard regular-expression code has been reported to be very slow when applied to extremely long character strings (tens of thousands of characters or more): the code used when perl = TRUE seems much faster and more reliable for such usages.

The standard version of gsub does not substitute correctly repeated word-boundaries (e.g. pattern = "\b"). Use perl = TRUE for such matches.