D1W1TR15 D1W1TR15 - 4 months ago 17
Python Question

Regex - Python: Capture three (3) words after a specific word

Hello everyone I have the following code:

str1 = "Hello, I would like to meet you at the train station of Berlin after 6 o' clock"
match = re.compile(r' at \w+ \w+ \w+')
match.findall(str1)


Is there a better way than "\w+ \w+ \w" so for example to capture specific number of words?

Answer

Yes. To specify a particular count for the match, use curly-braces. E.g.,:

match = re.compile(r'at ((\w+ ){3})')

Which gives:

>>> print match.findall(str1)
[('the train station ', 'station ')]

In general, to capture just the n words after word, your regex would be:

'word\s+((?:\w+(?:\s+|$)){n})'

Where ?: designates a "non-capturing" group, \s designates whitespace, | means "or", and $ means "end of string". Therefore:

>>> print re.compile(r'at\s+((?:\w+(?:\s+|$)){3})').findall(str1)
['the train station ']