maudulus maudulus - 3 months ago 19
Javascript Question

Split html including split of ul tag with regex

I am splitting a block of HTML by words using

\b(\w+(?![^<>]*>))\b


var html = splitParagraph.html();
var splitHtml = html.split(/\b(\w+(?![^<>]*>))\b/);


The HTML I am doing this on looks something like the following:

<h2>Lorem</h2><br>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor<br>
<br>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor<br>
<br>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor<br>
<br>
[Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor]<br>
<br>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor:<br>
<ul><br>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor</li><br>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor</li><br>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor</li><br>
</ul><br>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor<br>
<br>


You can see it working here: http://www.regexpal.com/?fam=95537

However, what I want to do is make it so that the regex also includes
ul
tags when it splits, so that, in an array, it might look something like
["lorem", " ", "ipsum", "<ul><li>lorem</li><li>ipsum</li><li>blah</li></ul>"]
(note that the ul is its own item). Thus, it would not split anything inside of the
ul
, but just move on to whatever is after the ul.

I know that I can use
\s*<ul[^>]*>[\S\s]*?<\/ul>\s*
to match the
ul
, (thanks for the ref) but I'm not sure how to combine these two.

Answer

You could try /\<ul\>[\w\W]+\<\/ul\>|\b(\w+(?![^<>]*>))\b/g , but I'm sure there's a smaller solution, since this one just matches your original plus anything between <ul> and </ul> tags.

I would advice against using this kind of structure though since it's difficult to maintain or expand upon. Any use case what you'll do with that resulting array? Maybe there's better options.

edit: as shown, you can just join both regexes with the |