alphanumeric alphanumeric - 1 month ago 7
Python Question

How to use REGEX with multiline

The following expression works well extracting the portion of

data
string that starts with the word
Block
followed by open bracket
{
and ending with the closing bracket '}':

data ="""
Somewhere over the rainbow
Way up high
Block {
line 1
line 2
line 3
}
And the dreams that you dreamed of
Once in a lullaby
"""
regex = re.compile("""(Block\ {\n\ [^\{\}]*\n}\n)""", re.MULTILINE)
result = regex.findall(data)
print result


which returns:

['Block {\n line 1\n line 2\n line 3\n}\n']


But if there is another curly bracket inside of the Block portion of the string the expression breaks returning an empty list:

data ="""
Somewhere over the rainbow
Way up high
Block {
line 1
line 2
{{}
line 3
}
And the dreams that you dreamed of
Once in a lullaby
Block {
line 4
line 5
{{
}
line 6
}
Somewhere over the rainbow
Blue birds fly
And the dreams that you dreamed of
Dreams really do come true ooh oh
"""


How to modify this regex expression to make it ignore the brackets that are inside of the Blocks and yet each block is returned as the separate entity in
result
list (so each Block could be accessed separately)?

Answer

Wouldn't this work?

regex = re.compile("""(Block\ {\n\ [^\}]*\n}\n)""", re.MULTILINE)

In the version you've posted, it is exiting the match whenever it comes across a second opening brace, even though you want it to exit upon the first closing brace. If you want nested opening / closing braces that's another story.