ErickBest ErickBest - 7 months ago 18
PHP Question

PHP get img src from xml

I have a page with xml that looks like:

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&amp;oe=57BB41D5&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&amp;oe=57778068&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>


I want to get the
src
s of the
img
s in the
xml
above.


The images are stored in the
<description>
however, they are not in the format of

<img...


they rather look like:

&lt;img src=&#34;https://scontent.xx.fbc...
.

the
<
is replace with
&lt;
... I guess thats why
$imgs = $dom->getElementsByTagName('img');
returns nothing.

Is there any work around?

This is how I call it:

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;

//Then run a possible foreach
//something like:

foreach($imgs as $img){

$src= ///the src of the $img

//try it out
echo '<img src="'.$src.'" /> <br />',
}


Any Idea?

Answer

You have HTML embedded in XML tags, so you have to retrieve XML nodes, load each HTML and retrieve desired tag attribute.

In your XML there are different <description> nodes, so using ->getElementsByTagName will return more than your desired nodes. Use DOMXPath to retrieve only <description> nodes in the right tree position:

$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;

$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );

Then iterate all nodes, load node value in a new DOMDocument (no need to decode html entities, DOM already decodes it for you), and extract src attribute from <img> node:

foreach( $nodes as $node )
{
    $html = new DOMDocument();
    $html->loadHTML( $node->nodeValue );
    $src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}

eval.in demo