Hüseyin Halis Hüseyin Halis - 6 months ago 14
HTML Question

I can not choose RegEx

<p[^>]*>([a-zA-Z0-9_\W]*)\:<\/p>.*?(<blockquote[^>]*>).*?<\/blockquote>

<p> demo demo:</p> <p ><img src="http://demo.com/123.jpg" width="100%"/> <br/> <em>Credit: demo2 demo2 </em></p> <p >here1 here1:</p> <blockquote cite="here1"> <p><em>demo3. demo3 demo3 demo3:</em></p> </blockquote> <p >demo4 demo4:</p> <p ><img src="http://demo.com/1234.jpg" width="100%"/> <br/> <em>demo5 demo 5 demo5</em></p> <p >demo6 demo6:</p> <blockquote cite="demo6"> <p><em>demo7 demo7<br/>


The above pattern is malfunctioning. Where can I make mistakes. The pattern I wrote above, I can not choose what I want in a sentence. Help request.

I want to get results:

<p >here1 here1:</p> <blockquote cite="here1"> <p><em>demo3. demo3 demo3 demo3:</em></p> </blockquote>


I added that I will ask for a sample.

Answer

if you really want to use regex here, this may work for you:

<p[^>]*>((?:(?!<\/p>).)+)<\/p>\s*<blockquote[^>]*>(.*?)<\/blockquote>

the relevant part is ((?:(?!<\/p>).)+): in English, it says, "look ahead to make sure there's no </p>, then grab one character, and repeat this (until the next </p>)". In this way, no multiple sibling <p>'s (or indeed, nested <p>'s) could be matched, which is what is happening with your original pattern: <p[^>]*>([a-zA-Z0-9_\W]*)\:<\/p> will match, wrongly in this context: <p>one paragraph</p><p>second paragraph</p>. I also specified only spaces (\s*, not .*?) between the <p> and the <blockquote>, so you only match the preceding <p>.

demo