KKKKKKKK KKKKKKKK - 5 months ago 10
Javascript Question

Using regex to extract an image url from string

I have a string that looks like this:

var complicatedString = "<![CDATA[<img src=\"http://l.yimg.com/a/i/us/we/52/32.gif\"/>\n<BR />\n<b>Current Conditions:</b>\n<BR />Sunny\n<BR />\n<BR />\n<b>Forecast:</b>\n<BR /> Fri - Sunny. High: 23Low: 13\n<BR /> Sat - Thunderstorms. High: 25Low: 15\n<BR /> Sun - Thunderstorms. High: 28Low: 21\n<BR /> Mon - Partly Cloudy. High: 24Low: 17\n<BR /> Tue - Partly Cloudy. High: 26Low: 18\n<BR />\n<BR />\n<a href=\"http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-23511893/\">Full Forecast at Yahoo! Weather</a>\n<BR />\n<BR />\n(provided by <a href=\"http://www.weather.com\" >The Weather Channel</a>)\n<BR />\n]]>"


I need to extract http://l.yimg.com/a/i/us/we/52/32.gif. The regex I came up with is:

var re = /(alt|title|src)=(\\"[^"]*\")/i;


See Fiddle: https://jsfiddle.net/47rveu62/2/

I'm not sure why but this isn't working.

var re = /(alt|title|src)=(\\"[^"]*\")/i;
var m;
do {
m = re.exec(complicatedString);
} while(m !== null);


Update: Regex 101 claims it works https://regex101.com/r/oV2hO2/1

Answer

The problem is with the regex.

The backslashes in the string are used to escape the double-quote inside double-quoted string. The backslashes are the escape characters and not part of the string. So, in regex those are not required.

Here's how the string looks when logged in console

var re = /(alt|title|src)=(\\"[^"]*\")/i;
                           ^^      ^     // Remove those

Use

/(alt|title|src)=("[^"]*")/gi;

The g flag here is required as the lastIndex property of the regex is not updated by RegExp#exec and the next iteration the regex will start search from the same index and will thus go in infinite loop. MDN

var complicatedString = "<![CDATA[<img src=\"http://l.yimg.com/a/i/us/we/52/32.gif\"/>\n<BR />\n<b>Current Conditions:</b>\n<BR />Sunny\n<BR />\n<BR />\n<b>Forecast:</b>\n<BR /> Fri - Sunny. High: 23Low: 13\n<BR /> Sat - Thunderstorms. High: 25Low: 15\n<BR /> Sun - Thunderstorms. High: 28Low: 21\n<BR /> Mon - Partly Cloudy. High: 24Low: 17\n<BR /> Tue - Partly Cloudy. High: 26Low: 18\n<BR />\n<BR />\n<a href=\"http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-23511893/\">Full Forecast at Yahoo! Weather</a>\n<BR />\n<BR />\n(provided by <a href=\"http://www.weather.com\" >The Weather Channel</a>)\n<BR />\n]]>";

var re = /(alt|title|src)=("[^"]*")/gi;
var m;
while(m = re.exec(complicatedString)) {
    console.log(m[2]);
}


I'd suggest you to use following regex

/img.*?src=("|')(.*?)\1/i;

var complicatedString = "<![CDATA[<img src=\"http://l.yimg.com/a/i/us/we/52/32.gif\"/>\n<BR />\n<b>Current Conditions:</b>\n<BR />Sunny\n<BR />\n<BR />\n<b>Forecast:</b>\n<BR /> Fri - Sunny. High: 23Low: 13\n<BR /> Sat - Thunderstorms. High: 25Low: 15\n<BR /> Sun - Thunderstorms. High: 28Low: 21\n<BR /> Mon - Partly Cloudy. High: 24Low: 17\n<BR /> Tue - Partly Cloudy. High: 26Low: 18\n<BR />\n<BR />\n<a href=\"http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-23511893/\">Full Forecast at Yahoo! Weather</a>\n<BR />\n<BR />\n(provided by <a href=\"http://www.weather.com\" >The Weather Channel</a>)\n<BR />\n]]>";

var regex = /img.*?src=("|')(.*?)\1/i;
var match = complicatedString.match(regex)[2];
console.log(match);