Vas Vas - 1 month ago 5x
Ruby Question

Regex to match a specific sequence of strings

Assuming I have 2 array of strings
position1 = ['word1', 'word2', 'word3']
position2 = ['word4', 'word1']

and I want inside a text/string to check if the substring #{target} which exists in text is followed by either one of the words of position1 or following one of the words of the position2 or even both at the same time. Similarly as if I am looking left and right of #{target}.

For example in the sentence "Writing reports and inputting data onto internal systems, with regards to enforcement and immigration papers" if the target word is data I would like to check if the word left (inputting) and right (onto) are included in the arrays or if one of the words in the arrays return true for the regex match. Any suggestions? I am using Ruby and I have tried some regex but I can't make it work yet. I also have to ignore any potential special characters in between.

One of them:



I figured out this way with regex to capture the word left and right:


However what could I change if I would like to capture more than one words left and right?


If you have two arrays of strings, what you can do is something like this:

matches = /^.+ (.+) #{target} (.+?) .+$/.match(text)
if matches and (position1.include?(matches[1]) or position2.include?(matches[2]))

What this regex does is match the target word in your text and extract the words next to it using capture groups. The code then compares those words against your arrays, and does something if they're in the right places. A more general version of this might look like:

def checkWords(target, text, leftArray, rightArray, numLeft = 1, numRight = 1)
    # Build the regex
    regex = "^.+"
    for i in 1..numLeft
        regex += " (.+)"
    regex += " #{target}"

    for i in 1..numRight
        regex += " (.+?)"
    regex += " .+$"

    pattern =
    matches = pattern.match(text)

    return false if !matches

    for i in 1..numLeft
        return false if (!leftArray.include?(matches[i]))

    for i in 1..numRight
        return false if (!rightArray.include?(matches[numLeft + i]))

    return true

Which can then be invoked like this:

do_something() if checkWords("data", text, position1, position2, 2, 2)

I'm pretty sure it's not terribly idiomatic, but it gives you a general sense of how you would do what you in a more general way.