eyaler eyaler - 1 month ago 7
Python Question

Python regex to replace a double newline delimited paragraph containing a string

Define a paragraph as a multi-line string delimited on both side with double new lines ('\n\n'). if there exist a paragraph which contains a certain string ('BAD'), i want to replace that paragraph (i.e. any text containing BAD up to the closest preceding and following double newlines) with some other token ('GOOD'). this should be with a python 3 regex.

i have text such as:

dfsdf\n
sdfdf\n
\n
blablabla\n
blaBAD\n
bla\n
\n
dsfsdf\n
sdfdf


should be:

dfsdf\n
sdfdf\n
\n
GOOD\n
\n
dsfsdf\n
sdfdf

Answer

Here you are:

/\n\n(?:[^\n]|\n(?!\n))*BAD(?:[^\n]|\n(?!\n))*/g

OK, to break it down a little (because it's nasty looking):

  • \n\n matches two literal line breaks.
  • (?:[^\n]|\n(?!\n))* is a non-capturing group that matches either a single non-line break character, or a line break character that isn't followed by another. We repeat the entire group 0 or more times (in case BAD appears at the beginning of the paragraph).
  • BAD will match the literal text you want. Simple enough.
  • Then we use the same construction as above, to match the rest of the paragraph.

Then, you just replace it with \n\nGOOD, and you're off to the races.

Demo on Regex101

Comments