Andrey.Maralin Andrey.Maralin - 1 month ago 7
Python Question

How to get nested-groups with regexp

I need your help with following regex.
I have a text

"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."


using regex I want to get

[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]


the following rexeg
(\[[^\[$\]\]]*\])


gives me

[Hello|Hi]
[inviting | calling]
[junior| mid junior]


so how should I fix it to get correct output?

Answer

Let's define your string and import re:

>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> import re

Now, try:

>>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]*  \]', s, re.X)
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']

In more detail

Consider this script:

$ cat script.py
import re
s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

matches = re.findall(r'''\[       # Opening bracket
        (?:[^][]* \[ [^][]* \])*  # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
        [^][]*                    # Zero or more non-bracket characters
        \]                        # Closing bracket
        ''',
        s,
        re.X)
print('\n'.join(matches))

This produces the output:

$ python script.py
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
Comments