JLTChiu JLTChiu - 3 months ago 15
Python Question

How can I remove text within multi layer of parentheses python

I have a python string that I need to remove parentheses. The standard way is to use

text = re.sub(r'\([^)]*\)', '', text)
, so the content within the parentheses will be removed.

However, I just found a string that looks like
(Data with in (Boo) And good luck)
. With the regex I use, it will still have
And good luck)
part left. I know I can scan through the entire string and try to keep a counter of number of
(
and
)
and when the numbers are balanced, index the location of
(
and
)
and remove the content within middle, but is there a better/cleaner way for doing that? It doesn't need to be regex, whatever it will work is great, thanks.

Someone asked for expected result so here's what I am expecting:

Hi this is a test ( a b ( c d) e) sentence


Post replace I want it to be
Hi this is a test sentence
, instead of
Hi this is a test e) sentence

Answer

With the re module (replace the innermost parenthesis until there's no more replacement to do):

import re

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

nb_rep = 1

while (nb_rep):
    (s, nb_rep) = re.subn(r'\([^()]*\)', '', s)

print(s)

With the regex module that allows recursion:

import regex

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

print(regex.sub(r'\([^()]*+(?:(?R)[^()]*)*+\)', '', s))

Where (?R) refers to the whole pattern itself.