XCanG XCanG - 11 months ago 41
HTML Question

How to properly exclude group in regex?

I need to match in some text some pattern, but this pattern should not have another pattern.
I use in html some groups and html page does not add new line. Rather than new line in html added
so I get trouble here.

I try to use this regex:


and example is:

test1 | test2 | test3<br>| test4<br>| test5 |<br>test6

Should be matching only
| test2 |
and group
, but right now also matching
| test4<br>|
and not right
| test5 |
. I need to exclude test4 match, but don't know how to use it with
because it ignored

P.S. of course
| test2 |
also may be
| text1 <span ...>text2</span> text3 |
, so placing
is not a solution I need.

Answer Source

The regex you need should be based on a tempered greedy token:


See the regex demo

The token is (?:(?!<br\s*\/?>)[^\r\n|])* and it matches any character other than a CR/LF/| (the [^\r\n|] negated character class accounts for that) that is not starting a <br> tag sequence (or <br > or <br/> or <br />, etc.) The contents matched with the token are captured into group #1 since it is wrapped with a capturing parentheses (...).

JS demo:

var re = /\|((?:(?!<br\s*\/?>)[^\r\n|])*)\|/ig; 
var str = 'test1 | test2 | test3<br>| test4<br>| test5 |<br>test6|';
var res = [];
while ((m = re.exec(str)) !== null) {
  res.push(m[1]); // Grab Group 1 value only