José M. Carnero José M. Carnero - 6 months ago 25
HTML Question

Regex pattern to match hashtag, but not in HTML attributes

I'm trying to extract hashtags in an HTML text with the regular expression

, but with troubles in HTML attributes.

For example in the HTML text:

hola que tal with #hash1.
hola que tal with #hash2

y <a href="hola.que.tal#hash3"> para #hash4. </a>

I want to recover "hash1", "hash2" and "hash4" but not "hash3".

I tried to resolve it with lookarounds, with the following expression:


but without success.

How I can do it with a single regular expression?


This should work



What the negative lookahead does is makes sure that there is a < between the hashtag and the next >.