Fréderic Cox Fréderic Cox - 2 months ago 7
HTML Question

RegExp to search text inside HTML tags

I'm having some difficulty using a RegExp to search for text between HTML tags. This is for a search function to search text on a HTML page without find the characters as a match in the tags or attributes of the HTML. When a match has been found I surround it with a div and assign it a highlight class to highlight the search words in the HTML page. If the RegExp also matches on tags or attributes the HTML code is becoming corrupt.

Here is the HTML code:

<html>
<span>assigned</span>
<span>Assigned > to</span>

<span>assigned > to</span>

<div>ticket assigned to</div>

<div id="assigned" class="assignedClass">Ticket being assigned to</div>

</html>


and the current RegExp I've come up with is:

(?<=(>))assigned(?!\<)(?!>)/gi

which matches if assigned or Assigned is the start of text in a tag, but not on the others. It does a good job of ignoring the attributes and tags but it is not working well if the text does not start with the search string.

Can anyone help me out here? I've been working on this for a an hour now but can' find a solution (RegExp noob here..)

Answer

Update

Regex:

assigned(?![^<>]*(([\/"']|]]|\b)>|<\/script>))

Live demo

Using a negative lookahead you are able to check if current matched string is not within a valid HTML tag that ends to />, '>, "> or any-word-character>