user3191569 user3191569 - 1 month ago 5
Python Question

Python regex for finding everything inbetween two \n\n and \n\n

I have a large text string, with init there are several blocks that look very similar to this;

text = '\n\n(d)In the event of this happens a Fee
of \xc2\xa32,000 gross, on each such occasion.\n\n'


using the code below I can find all instances of money:

import re
re.finall('\xa3(.*)', text)


but this only returns up to the comma
In the event of this happens a Fee of \xc2\xa32,000 gross
instead of the whole block, I'm hoping to return the block where the Unicode for British pounds
\xa3
is mentions

Answer

I propose this regular expression:

text = ('\n\nthis is not wanted\n\n'
        '(d)In the event of this happens a Fee\n'
        'of \xc2\xa32,000 gross, on each such occasion.\n\n'
        'another wanted line with pound: \xc2\xa31,000\n\n'
        'this is also not wanted\n\n')

re.findall(r'(?:.+\n)*.*\xa3(?:.+\n)*', text)

This will find all multiline blocks of non-empty lines which contain at least one \xa3.

As @wiktor-stribi┼╝ew pointed out in a comment, this only finds those blocks which have another character after the pound symbol; this seems to be what you want, so it is no problem, but it should be mentioned.

Comments