Laim McKenzie Laim McKenzie - 3 months ago 13
PHP Question

Return each div with a certain class name in PHP

OK, So I have a page that has that has images on it that I'm looking to scrape and return the following information:


  • Base Image URL ("website.com/imagepage")

  • Image URL ("website.com/image.png")

  • Image QUOTE if it has one ("Wow, nice image")



I have it working to return ONE Image, but I need it to return all of them (there is about 5)

This is what I have at the moment:

function getMostRecentScreenshot($url) {
$content = file_get_contents($url);

$first_step = explode('<div class="imageWall5Floaters">' , $content );
$second_step = explode('<div style="clear: left;"></div>' , $first_step[1] );

return $second_step[0];
}


This is what it returns

<div class="floatHelp">
<a href="websiteurl.com/imagepage" onclick="return OnScreenshotClicked(9384938);" class="profile_media_item modalContentLink " data-desired-aspect="1.77777777778">
<div style="background-image: url('website.com/image');" class="imgWallItem " id="imgWallItem_757249198">
<div style="position: relative;">
<input type="checkbox" style="position: absolute; display: none;" name="screenshots[9384938]" class="screenshot_checkbox" id="screenshot_checkbox_9384938" />
</div>
<div class="imgWallHover" id="imgWallHover9384938">
<div class="imgWallHoverBottom">
<div class="imgWallHoverDescription ">
<q class="ellipsis">Quote about the image</q>
</div>
</div>
</div>


</div>
</a>




The give images have different ID's (the 9384938 part).

How would I get the information needed from what it returns?

I have another function at the moment that returns the data for one of the images (kind of), but it's basically just the exact same thing with code between the explode, which is very messy.

Answer

You could use PHP's DOMDocument class with this function:

function getDataFromHTML($html) {
    $doc = new DOMDocument();
    $html = $doc->loadHTML($html);

    foreach($doc->getElementsByTagName('a') as $a) {
        if (strpos($a->getAttribute('class'), 'profile_media_item') !== false) {
            $row = [];
            $row['baseURL'] = $a->getAttribute('href');
            foreach($a->getElementsByTagName('div') as $div) {
                preg_match("~(?<=url\(['\"]).*?(?=['\"])~", 
                           $div->getAttribute('style'), $attr);
                $row['imageURL'] = reset($attr);
                foreach($a->getElementsByTagName('q') as $q) {
                    $row['quote'] = $q->textContent;
                    break;
                }
                break;
            }
            $result[] = $row;
        }
    }
    return $result;
}

Call it as:

$result = getDataFromHTML($html);

Output for the sample data is:

array (
  array (
    'baseURL' => 'websiteurl.com/imagepage',
    'imageURL' => 'website.com/image',
    'quote' => 'Quote about the image'
  )
)

The outer array would have more such entries if run on a HTML string that has several of those DOM structures.