user2284570 user2284570 - 27 days ago 5
HTML Question

Regex to return all attributes of a web page that starts by a specific value

The question is simple, I need to get the value of all attributes whose value starts with

http://example.com/api/v3?
. For example, if a page contains

<iframe src="http://example.com/api/v3?download=example%2Forg">
<meta twitter="http://example.com/api/v3?return_to=%2F">


Then I should get an array/list with 2 member :
http://example.com/api/v3?return_to=%2F
and
http://example.com/api/v3?download=example%2Forg
(the order doesn’t matter).

I don’t want the elements, just the attribute’s value.

Basically I need the regex that returns strings starting with
http://example.com/api/v3?
and ending with a space.

Answer

A regular expression would likely look like this:

/http:\/\/example\.com\/api\/v3\?\S+/g

Make sure to escape each / and ? with a backslash. \S+ yields all subsequent non-space characters. You can also try [^\s"]+ instead of \S if you also want to exclude quote marks.

In my experience, though, regexes are usually slower than working on already parsed objects directly, so I’d recommend you try these Array and DOM functions instead:

Get all elements, map them to their attributes and filter those that start with http://example.com/api/v3?, reduce all attributes lists to one Array and map those attributes to their values.

Array.from(document.querySelectorAll("*"))
  .map(elem => Object.values(elem.attributes)
  .filter(attr => attr.value.startsWith("http://example.com/api/v3?")))
  .reduce((list, attrList) => list.concat(attrList), [])
  .map(attr => attr.value);

You can find polyfills for ES6 and ES5 functions and can use Babel or related tools to convert the code to ES5 (or replace the arrow functions by hand).