jas jas - 2 months ago 11
Python Question

Why look ahead is returning matches for time-stamp

Trying to write a script in python for some post processing. I have a file that contains messages with a time-stamp. I want to extract all the messages into a list.

Regex - start from message until next time-stamp.

findallItems = re.findall(r'(?s)((?<=message).*?(?=((\d{4})\-((0[1-9])|(1[0-2]))\-((0[1-9])|(1[0-2]))|\Z)))', fileread)


This works fine but it also returns time-stamps as matches. How can I only return the message and not include time-stamps ?

If I use look ahead position as text then it works fine. For e.g

findallItems = re.findall(r'(?s)((?<=message).*?(?=message|\Z))',fileread)

Answer

You need to remove unnecessary capturing parentheses and convert others to non-capturing:

findallItems = re.findall(r'(?s)(?<=message).*?(?=(?:\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|1[0-2])|\Z))', fileread)

See this regex demo

However, you may just keep 1 capturing group over your necessary pattern and re.findall will only return this group value:

(?s)message(.*?)(?:\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|1[0-2])|\Z)
           ^   ^

See another regex demo

Comments