MakoBuk MakoBuk - 8 months ago 27
Javascript Question

JavaScript - regex order doesn't matter but existence required

I want to get content of canonical link from page. The code is in Node.js on server (without DOMs). I have complete body of response (downloaded page) and following code:

var metaRegex = new RegExp(/<link.*?href=['"](.*?)['"].*?rel=['"]canonical['"].*?>/i);
// return correctly:
// var metaRegex = new RegExp(/<link(?=.*rel=['"]canonical['"])(?=.*href=['"](.*?)['"]).*?>/i);
// return incorrectly:
var metaTag = metaRegex.exec(body);


In the first expression is problem with order of rel and href attributes. It takes only:

<link href="" rel="canonical">

and NOT

<link rel="canonical" href="">

The second expression takes both ordering, but it match the last occurrence of href.

It looks like if I should require existence of both attributes and may group it?

What is the correct way?


Just use two sequential RegExps, like that:

var body = '<link rel="stylesheet" href="my.css"/> <link href="" rel="canonical"/> <a href=""/>'
var linkRegexp = /(<link[^>]*rel=['"]canonical['"][^>]*>)/;
var hrefRegexp = /href=['"](.*?)['"]/;

var linkBody = linkRegexp.exec(body)[1];
  • linkRegexp - get the link with rel='canonical'
  • hrefRegexp - extract href from it