Bilal075_ Bilal075_ - 3 months ago 10
Javascript Question

Regex - ignoring text between quotes / HTML(5) attribute filtering

So I have this Regular expression, which basically has to filter the given string to a HTML(5) format list of attributes. It currently isn't doing my fulfilling, but that's about to change! (I hope so)

I'm trying to achieve that whenever an occurrence is found, it selects the text until the next occurrence OR the end of the string, as the second match. So if you'd take a look at the current regular expression:

/([a-zA-Z]+|[a-zA-Z]+-[a-zA-Z0-9]+)=["']/g


A string like this:
hey="hey world" hey-heyhhhhh3123="Hello world" data-goed="hey"


Would be filtered / matched out like this:

MATCH 1. [0-3] `hey`
MATCH 2. [16-32] `hey-heyhhhhh3123`
MATCH 3. [47-56] `data-goed`


This has to be seen as the attribute-name(s), and now.. we just have to fetch the attribute's value(s). So the mentioned string has to have an outcome like this:

MATCH 1.
1 [0-3] `hey`
2 [6-14] `hey world`
MATCH 2.
1 [16-32] `hey-heyhhhhh3123`
2 [35-45] `Hello world`
MATCH 3.
1 [47-56] `data-goed`
2 [59-61] `hey`


Could anyone try and help me to get my fulfilling? It would be appericiated a lot!

Answer

You can use

/([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g

See regex demo

Pattern details:

  • ([^\s=]+) - Group 1 capturing 1 or more characters other than whitespace and = symbol
  • = - an equal sign
  • (?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+)) - a non-capturing group of 2 alternatives (one more '([^'\\]*(?:\\.[^'\\]*)*)' alternative can be added to account for single quoted string literals)
    • "([^"\\]*(?:\\.[^"\\]*)*)" - a double quoted string literal pattern:
      • " - a double quote
      • ([^"\\]*(?:\\.[^"\\]*)*) - Group 2 capturing 0+ characters other than \ and ", followed with 0+ sequences of any escaped symbol followed with 0+ characters other than \ and "
      • " - a closing dlouble quote
    • | - or
    • (\S+) - Group 3 capturing one or more non-whitespace characters

JS demo (no single quoted support):

var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g; 
var str = 'hey="hey world" hey-heyhhhhh3123="Hello \\"world\\"" data-goed="hey" more=here';
var res = [];
while ((m = re.exec(str)) !== null) {
    if (m[3]) {
      res.push([m[1], m[3]]);
    } else {
      res.push([m[1], m[2]]);
    }
}
console.log(res);

JS demo (with single quoted literal support)

var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)'|(\S+))/g; 
var str = 'pseudoprefix-before=\'hey1"\' data-hey="hey\'hey" more=data and="more \\"here\\""';
var res = [];
while ((m = re.exec(str)) !== null) {
  if (m[2]) {
    res.push([m[1], m[2]])
  } else if (m[3]) {
    res.push([m[1], m[3]])
  } else if (m[4]) {
    res.push([m[1], m[4]])
  }
}
console.log(res);

Comments