shiruxmachine shiruxmachine - 2 months ago 10
Ruby Question

Regexp escape is not escaping substring separated by |

I have a string...

str = "bookworms|actuarial-consultants|uninterruptible-power-supply-(ups)-experts|c++programming-developers"


with special characters such as parentheses and
+
symbol.

can't match
uninterruptible-power-supply-(ups)-experts
and
c++programming-developers
unless I escape them manually like

bookworms|actuarial-consultants|uninterruptible-power-supply-\(ups\)-experts|c\+\+programming-developers


https://gyazo.com/d545ab1a8d7d178a6079f4b9cb125cce

My string can accomodate numerous substrings separated by
|
so I can't escape manually. And because, the string is generated thru a query method.

I tried
Regexp.escape
but it did not output the correct one. I can't still match
uninterruptible-power-supply-(ups)-experts
and
c++programming-developers
, even the normal substring like
bookworms


https://gyazo.com/ae0bb43a1dc84f40deb18e3ed76d490e

the
escape
method is adding double
\\
to my string.

bookworms\\|actuarial\\-consultants\\|uninterruptible\\-power\\-supply\\-\\(ups\\)\\-experts\\|c\\+\\+programming\\-developers

Answer

It's not possible to tell what you're doing from the little you told us, but it sounds like you're not using Regexp.escape correctly.

Meditate on this:

str = "bookworms|actuarial-consultants|uninterruptible-power-supply-(ups)-experts|c++programming-developers"
Regexp.escape(str) 
# => "bookworms\\|actuarial\\-consultants\\|uninterruptible\\-power\\-supply\\-\\(ups\\)\\-experts\\|c\\+\\+programming\\-developers"

Notice that the "OR" (|) are being escaped which isn't desirable.

If you pass an array of the strings, then union will escape when necessary and concatenate each resulting escaped pattern into one large pattern:

Regexp.union(str.split('|')) 
# => /bookworms|actuarial\-consultants|uninterruptible\-power\-supply\-\(ups\)\-experts|c\+\+programming\-developers/

Using that in code:

regex = Regexp.union(str.split('|')) # => /bookworms|actuarial\-consultants|uninterruptible\-power\-supply\-\(ups\)\-experts|c\+\+programming\-developers/

'uninterruptible-power-supply-(ups)-experts'[regex] # => "uninterruptible-power-supply-(ups)-experts"
'c++programming-developers'[regex] # => "c++programming-developers"

shows the patterns are matching.

There are things to watch out for, but that's the basics.