irek irek - 11 months ago 86
HTML Question

Search text using regex that ignore html tags

I need to add highlight class around searched text. But the other html tags are in my way. Here is an example:

Starting with:

<div class="source">your <b><i>text</i></b> using <a href="#">regex ignoring html</a> tags</div>

And I search for:
text using regex

The expected result (in this example I will use
for highlight):

<div class="source">your <b><i><span>text</span></i></b><span> using </span><a href="#"><span>regex</span> ignoring html</a> tags</div>

I have solution for this but it require specific regex that search for text ignoring html tags inside. If there is an solution other then presented below I'm open for suggestion. And it doesn't have to be written in vanilla js. Below is simplified version of my current solution that lack mentioned regex.

example below don't work because of missing regex

var source = document.querySelector('.source').innerHTML; // html from example
var text = 'text using regex'; // what we searching for
var htmlTag = new RegExp('(<\\/?([a-z]+)([^<]+)*(?:>))+', 'g'); // find html tags
var missingRegExp = new RegExp('', 'i'); // << missing regex

// Wrap searched text with span tag
var result = source.replace(missingRegExp, function (searchedText) {
// Wrap html tags inside searched text with span tag
searchedText = searchedText.replace(htmlTag, function (match) {
return '</span>' + match + '<span>';

return '<span>' + searchedText + '</span>';

console.log('Result: ' + result);

In this case removing html tags is not an option.

Answer Source

You have a string like text using regex. You should care about middle spaces and replace them with appropriate RegEx to match HTML tags but at first you need to enclose each word in parentheses:

> '(' + "text using regex".split(' ').join(') (') + ')'
< "(text) (using) (regex)"

Next step is replacing spaces with RegEx: ((?:\s*(?:<\/?\w[^<>]*>)?\s*)*) so our last modified version should be:

< "(text)((?:\s*(?:<\/?\w[^<>]*>)?\s*)*)(using)((?:\s*(?:<\/?\w[^<>]*>)?\s*)*)(regex)"

If we had 3 words to search then we end to have 5 capturing groups totally (n words -> n + n-1 capturing groups), so you should create a replacement string based on that. Here we should have our replacement string as such:


Now you have your compiled RegEx version and replacement string, .replace() method will put a successful end to them.

Live demo