Jun Jun - 6 months ago 17
Java Question

Regex capturing within a group

I want to be able to delete all instances of newlines within p tags. But not the ones outside. Example:

<p dir="ltr">Test<br>\nA\naa</p>\n<p dir="ltr">Bbb</p>


This is the regex I came up with

(<p[^>]*?>)(?:(.*)\n*)*(.*)(</p[^>]*?>)


and I replace with

$1$2$3$4


I was hoping that this would work but
(?:(.*)\n*)*
seem to be causing issues. Is there any way to do repeated matches like this with a capturing group?

Thanks in advance!

Answer

You can use this regex

(?s)(?:<p|\G(?!\A))(?:(?!<\/p>).)*?\K([\n]+)

Regex Demo

With a bit of hack, I was able to do it in Java

String line = "<p dir=\"ltr\">Test<br>\nA\naa</p>\nabcd\n<p dir=\"ltr\">Bbb</p>"; 
System.out.println(line.replaceAll("(?s)((?:<p|\\G(?!\\A))(?:(?!<\\/p>).)*?)[\\n]+", "$1"));

Ideone Demo

Comments