staten12 staten12 - 1 month ago 5
Python Question

Need help in RegEx to grab anything after a mandatory value

I have a text in which I need to grab data and split it up. I need to find "Review frequency" within a large group of text, then once that is found, take everything after it and stop at the ')'.

Example text is:

No. of components Variable
Review frequency Quarterly (Mar., Jun., Sep., Dec.)
Quick facts
To learn more about the


What I need is 'Quarterly' and 'Mar., Jun., Sep., Dec.'

My current regex is:

((?=.*?\bReview frequency\b)(\b(Q|q)uarterly|(A|a)nnually|(S|s)emi-(A|a)nnually))


But this is not working. Essentially the 'Review frequency' needs to be the qualifier before we start picking up the other information, as there may be other dates/periods within the file. Thank you!

Answer

You are not matching the rest of the data on the line.

I suggest using:

(?m)^Review frequency[ \t]+(\w+)[ \t]+(.+)

See the regex demo

If the first capturing group can only contain 3 words as indicated in your pattern, use

(?m)^Review frequency[ \t]+([Qq]uarterly|(?:[Ss]emi-)?[Aa]nnually)[ \t]+(.*)

See another regex demo

Use these patterns with re.findall:

import re
regex = r"(?m)^Review frequency[ \t]+([Qq]uarterly|(?:[Ss]emi-)?[Aa]nnually)[ \t]+(.*)"
test = "No. of components Variable\nReview frequency Quarterly (Mar., Jun., Sep., Dec.\nQuick facts\nTo learn more about the"
print(re.findall(regex, test))
Comments