mfq mfq - 5 months ago 10
Ruby Question

Lookahead regex in Ruby returns `nil` on irb

I have input:

s = "<tag1 value = \"HelloWorld\" val = \"1234\">"


I want to fetch
'HelloWorld'
and
'1234'
.

I am using this regex expression

(?<=\")+[a-zA-Z0-9]*+(?=\\)


On rubular, it gives the expected result, but on irb, it returns
nil
:

s.scan(/(?<=\")+[a-zA-Z0-9]*+(?=\\)/) # => []


Why this is happening can anybody explain ? what I am missing

Answer
s = "<tag1 value = \"HelloWorld\" val = \"1234\">"

the string value is:

<tag1 value = "HelloWorld" val = "1234">

It can be easily checked by executing e. g. puts s. You see the backslashes there because the string in ruby might be declared using double quotes and in this case the double quotes inside string are to be escaped with backslashes. Other ways to declare the same string in ruby are:

s = '<tag1 value = "HelloWorld" val = "1234">'
s = %|<tag1 value = "HelloWorld" val = "1234">|
s = <<STR
<tag1 value = "HelloWorld" val = "1234">
STR

neither requires escaping double quotes. If you have copied the string as it was displayed in IRB to rubular, with escaping backslashes, you’ve matched another string.

That said, since there are no backslashes in the original string, nothing was matched in ruby. There are other glitches with the regexp you’ve used.

Here is the most careful version of the regexp:

s.scan /(?<=")\w+(?=")/
#⇒ ["HelloWorld", "1234"]