Vas Vas - 1 year ago 53
Ruby Question

Regex cuts word if end of string

I want to check and capture 2 or x words after and before a target string in a multiline text. The problem is that if the words matched are less than x number of words, then regex cuts off the last word and splits it till x.

For example

text = "This is an example /year"

if example is the target:

Matching Data: "is" , "an", "/yea", "r"

If i add random words after /year it matches it correctly.

How could I fix this so that if less than x words exist just stop there or return empty for the rest of the matches?

So it should be

Matching Data: "is" , "an", "/year", ""

def checkWords(target, text, numLeft = 2, numRight = 2)

target ={|x| x.inspect}.join('').gsub(/"/, '')

regex = ""
regex += "\\s+{,2}(\\S+)\\s+{,2}" * numLeft
regex += target
regex += "\\s+{,2}(\\S+)" * numRight

pattern =
matches = pattern.match(text)

puts matches.inspect


Answer Source

Since you want to capture the words before and after target, you need to set a capturing group around the whole regex parts that match the 0 to 2 occurrences of spaces-non-spaces. Also, you need to allow a minimum bound of 0 - use {0,2} (or a more succint {,2}) limiting quantifier to make sure you get the context on the left even if it is missing on the right:

 ^              ^      ^              ^

See this Rubular demo

If you use /(?:(\S+)\s+){0,2}target(?:\s+(\S+)){0,2}/, all captured values but the last one will be lost, i.e. once quantified, repeated capturing groups only store the value captured during the last iteration in the group buffer.

Also note that setting a {,2} quantifier on the + quantifier makes no sense, \\s+{,2} = \\s+.