Alex Alex - 2 months ago 23
PHP Question

Reading Google Sitemap XML via PHP

I have a image website hosted by a company. They generate (and submit to Google) a sitemap for my site. I'm trying to read the XML so I can "do stuff" with the data in my sitemap (namely hunt down missing captions and missing titles AND randomly posting one of these entries in my site as "image of the day"). The format for the sitemap is as follows:

<url>
<loc>http://www/link</loc>
<image:image>
<image:loc>http://www/img.jpg</image:loc>
<image:caption>caption for the image here</image:caption>
<image:title>title of image here</image:title>
</image:image>
</url>


My issue is I've been struggling to parse this data to make it usable in PHP. I've tried simplexml_load_file, but that only seems to capture the < loc > and ignores the whole < image:image >. I tried ->xpath(), but that has the same result. How do I get this into a usable format?

Footnote: In order to access my sitemap, the xml file is gzipped, so I use the following format to "read" it:

$url = "compress.zlib://http://www/sitemap/0.xml.gz";


I don't know if this has any effect on the input.

Answer

As bad solution:

$url = "compress.zlib://http://www/sitemap/0.xml.gz";
$xml=file_get_contents($url);

$xml=preg_replace('/image:(.*?)>/i','$1>',$xml);

print_r(simplexml_load_string($x));