Pavel Valeriu Pavel Valeriu - 1 year ago 196
Javascript Question

Get all links from html page using regex

I'm using Google Apps Script to fetch the content of emails from gmail and after that I need to extract all of the links from the html tags. I found some code here, on stackoverflow, and I implemented it with a regular expression, but the issue is that it is always returning me the first url. (

Is there a way to make a loop that search for the next content that matches the regex expression to display all of the elements one by one?

Here you can see an example with the content of an email that I need to get those links from:

This is my code:

function getURL() {

var threads = GmailApp.getInboxThreads();
var message = threads[0].getMessages()[0];
var content = message.getRawContent();

var source = (content || '').toString();
var urlArray = [];
var url;
var matchArray;

// Regular expression to find FTP, HTTP(S) URLs.
var regexToken = /(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/;

// Iterate through any URLs in the text.
while( (matchArray = regexToken.exec( source )) !== null )
var token = matchArray[0];
urlArray.push( token );

Changed the regex to
improved the things but now I also get the following type of response when I search for urls:
... I think that the regex should also have a condition to return the
but only up to the

Also, is there a way to remove the additional characters like
from the found url?

Answer Source

You need to use a global modifier /g to get multiple matches with RegExp#exec.

Besides, since your input is HTML code, you need to make sure you do not grab < with \S:


See the regex demo.

If for some reason this pattern does not match equal signs, add it as an alternative:


See another demo (however, the first one should do).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download