iyuyguyg iyuyguyg - 4 months ago 12
HTML Question

RegEx to place tags around matched word

I want to put bold tags around words that match in a string. However, I also need to be able to find the matched words in a url. If possible I would like to have one RegEx for everything.

Here's what I have tried so far:

I have tried

new RegExp("(^|\\s)(" + match.join('|') + ")(\\s|$)","ig")

and
new RegExp('(\\b)(' + match2.join('|') + ')(\\b)','ig')


//keyword
var keyword = "Donec sed odio bacon dui.";
var match = ["donec", "bacon", "dui"]; //why does it ignore dui???

var reg1 = new RegExp("(^|\\s)(" + match.join('|') + ")(\\s|$)","ig");
//var reg1 = new RegExp('(\\b)(' + match.join('|') + ')(\\b)','ig');
var reg2 = "$1<b>$2</b>$3";

var keyword = keyword.replace(reg1, reg2);

console.log(keyword);


PLEASE HELP

Answer

The problem is with overlapping matches. And the word dui has a full stop after it (it is not a whitespace, nor end of string). Use a word boundary at the end of first regex:

var reg1 = new RegExp("(^|\\W)(" + match.join('|') + ")(?!\\w|(?:[^<]*</[^>]+)?>)","ig");
var reg2 = "$1<b>$2</b>";

Note that instead of the \\b, you can use (?!\\w) negative lookahead and instead of (^|\\s) you may use (^|\\W) to make sure you do not depend upon whitespace around the keywords. The (?!\\w|(?:[^<]*</[^>]+)?>) lookahead will fail the match if the keyword happens to be inside an already tagged text.

The second regex requires word boundaries since the words are in between hyphens:

var reg3 = new RegExp("\\b(" + match2.join('|') + ")\\b(?!(?:[^<]*</[^>]+)?>)","ig");
var reg4 = "<b>$1</b>";

or a more versatile:

var reg3 = new RegExp("(^|\\W)(" + match2.join('|') + ")(?!\\w|(?:[^<]*</[^>]+)?>)","ig");
var reg4 = "$1<b>$2</b>";

Also, you need to escape special regex metacharacters your keywords so that they were treated as literal symbols. See match.map(x => x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).

See demo (the replacement pattern is the same for both regexps, declared once):

//keyword
var keyword = "Donec <b>sed</b> odio bacon dui.";
var match = ["test.", "donec", "bacon", "dui"];
var reg = new RegExp("(^|\\W)(" + match.map(x => x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|') + ")(?!\\w|(?:[^<]*</[^>]+)?>)","ig");
var repl = "$1<b>$2</b>";
var keyword = keyword.replace(reg, repl);

console.log(keyword); 

//website
var keyword2 = "http://www.website.co.uk/hey-<b>more hello o</b>-hey-hi"; //doesnt work
var match2 = ["hello", "hey", "b"];
var reg2 = new RegExp("(^|\\W)(" + match2.map(x => x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|') + ")(?!\\w|(?:[^<]*</[^>]+)?>)","ig");
var keyword2 = keyword2.replace(reg2, repl);

console.log(keyword2);