Nic Hubbard Nic Hubbard - 1 month ago 10
Objective-C Question

Regex to match any HTML attribute value and quotes

I have seen this question on SO before, but it was specific to a tag or attribute

I need to match any attribute values with a regex. I have the following, which matches both the attribute and value:

(\S+)=["']?((?:.(?!["']?\\s+(?:\S+)=|[>"']))+.)["']?


But, I only want it to match the value and quotes around the value. It also needs to account for single and double quotes.

I understand the suggestions to avoid doing this with HTML and to use a parser, but this is a specific needed situation. I am only using it to color code the attribute value.

Any help?

Answer

I made a slight mod to your regex string.

I replaced the (\S+)= with (?<==).

I think your regex implementation should be able to do a positive lookbehind.

This regex will show inconsistency when presented with quotes/doublequotes nested inside themselves like this: <a onclick='StackExchange.switchMobile("on")'>mobile</a>

You may want to look into changing your character classes to get around that.

Here's the full regex string:

(?<==)["']?((?:.(?!["']?\\s+(?:\S+)=|[>"']))+.)["']?


As per our online chat discussion, I came up with a new regex which is shorter and much cleaner:

(?<==)('|").*?\1(?=.*?>)

What this regex does is as follows:

  1. Assert that we can find a = symbol - (?<==)
  2. Followed by a single/double quote (place this into a capture group) - ('|")
  3. Match anything (non-greedy) until we find another quote of the same type - .*?\1
  4. Assert that there is a closing tag > somewhere ahead of our match - (?=.*?>)