Sanjay Salunkhe Sanjay Salunkhe - 3 months ago 6
Ruby Question

Ruby Regular expression: Not able to find word from string

I am trying to find word

"the"
that has space before character "t" and after character "e" from string "the the the the" . i am using below regular expression but it is giving me only one word
"the"
instead of two word
'the'
.

s="the the the the"
s.scan(/\sthe\s/)
output - [" the "]


I was expecting expression to return tow middle word "the". why this is happening.

Answer

The problem here is that \s patterns consume the whitespace. The scan method only matches non-overlapping matches, and your expected matches are overlapping.

You need to use looakrounds to get overlapping matches:

/(?<=\s)the(?=\s)/

See the regex demo and a Ruby demo where puts s.scan(/(?<=\s)the(?=\s)/) prints 2 the instances.

Pattern details:

  • (?<=\s) - a positive lookbehind that requires a whitespace to be present immediately before the the
  • the - a literal text the
  • (?=\s) - a positive lookahead that requires a whitespace right after the the.

Note that if you use \bthe\b (i.e. use word boundaries), you will get all the instances from your string as \b just asserts the position before or after a word char (letter, digit or underscore).