Juergen Juergen - 2 months ago 22
Markdown Question

regex to encapsule paragraph inside markdown file needed

I am trying to encapsule a limerick (paragraph with escaped line endings / single line breaks) inside some paragraphs of a markdown file.

Example:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Sed maximus ut dui non malesuada. Duis ultrices erat quis velit rutrum, a elementum lectus dictum.

There was a young lady named Bright\
who traveled much faster than light.\
She set out one day\
in a relative way,\
and came back the previous night.

Nulla in dapibus erat. Integer sed cursus nunc.

Quisque quis neque orci. Aliquam in leo consectetur, molestie massa quis, pretium nulla.


Now, how can I achieve this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Sed maximus ut dui non malesuada. Duis ultrices erat quis velit rutrum, a elementum lectus dictum.

<tag>There was a young lady named Bright\
who traveled much faster than light.\
She set out one day\
in a relative way,\
and came back the previous night.</tag>

Nulla in dapibus erat. Integer sed cursus nunc.

Quisque quis neque orci. Aliquam in leo consectetur, molestie massa quis, pretium nulla.


I was able to catch the end of the limerick paragraph. But the damn regular expression is too greedy when I use /m and s/ modifiers.

I tried

[^\n]^$.+?\\

^$[^\n].+?\\

^$[^\n].+?\\

^$.^.+?\\.+?[^\\]$


It really drives me nuts.

Answer

It seems that you are trying to match consecutive lines that end with \ except the last line.

You may use

 preg_replace('/^.+\\\\(?:\R.+\\\\)*\R.*/m', '<tag>$0</tag>', $txt)

See this regex demo.

Details:

  • ^ - start of a line
  • .+ - 1 or more chars other than linebreak symbols as many as possible
  • \\ - a literal \
  • (?:\R.+\\)* - 0 or more sequences of:
    • \R - a linebreak
    • .+ - any 1+ chars other than linebreak symbols
    • \\ - a literal \
  • \R.* - a linebreak (\R) and any 0+ chars other than linebreak symbols (up to the end of line).