Vas Vas - 2 months ago 7
Ruby Question

Regex to capture words before and after a target in ruby

Assuming we have a text:


In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory.


What I am trying to do is find 2 words before and after a specific word (target). So for example if target is word start it should match '
at
' '
the
' (left) and '
of
' '
the
' (right). I am using the following method in ruby but it returns no matches. Any tips about what to fix in my regex? I have also tried "
#{target}
" instead of
Regex.escape
.

def checkWords(target, text, numLeft = 2, numRight = 2)

regex = ""
regex += " (\\S+) " * numLeft
regex += Regexp.escape(target)
regex += " (\\S+)" * numRight

pattern = Regexp.new(regex, Regexp::IGNORECASE)
matches = pattern.match(text)

return true if matches
end


Edit:

Regex printed:

(\S+) (\S+) "£52" (\S+) (\S+)


Edit based on Wiktor Stribiżew:

def checkWords(target, text, numLeft = 2, numRight = 2)

pattern = Regexp.new(/#{"(\\S+) "*numLeft}#{Regexp.escape(target)}#{" (\\S+)"*numRight}/i)
matches = pattern.match(text)

end

Answer

You have spaces doubled around the first (\\S+):

regex += " (\\S+) " * numLeft
          ^

When you double it, this part looks like " (\\S+) (\\S+) " - there are 2 spaces between (\\S+)s.

So, in your case, just use

def checkWords(target, text, numLeft = 2, numRight = 2)
    text[/#{"(\\S+) "*numLeft}#{Regexp.escape(target)}#{" (\\S+)"*numRight}/i]
end
puts checkWords('start', 'In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory.')

See Ruby demo

It might be a good idea to add + after the spaces next to (\S+). And if you do not need the captures, remove the parentheses from around \S+.

Comments