MaryCoding MaryCoding - 1 year ago 40
PHP Question

Manipulate dom with php to scrape data

I am currently trying to manipulate

throuhg php to extract views from an fb video page. The below code was working until a bit ago. However now it doesnt find the
that contains the views count. This information is inside a div with id
. What would be the best way to manipulate the dom through php to get views of an fb video page?

private function _callCurl($url)
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Linux; Android 5.0.1; SAMSUNG-SGH-I337 Build/LRX22C; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/42.0.2311.138 Mobile Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
$http = curl_getinfo($ch, CURLINFO_HTTP_CODE);
return array(

function test()

$url = "";
$request = callCurl($url);
if ($request[0] == 200) {
$dom = new DOMDocument();
$elm = $dom->getElementById('fbPhotoPageMediaInfo');
if (isset($elm->nodeValue)) {
$views = preg_replace('/[^0-9]/', '', $elm->nodeValue);
} else {
$views = null;
} else {
echo "Error!";

return isset($views) ? $views : null;

Answer Source

Here is what I've determined...

  1. If you var_dump() on $request you can see that it's giving you a 302 code (redirect) rather than a 200 (ok).
  2. Changing CURLOPT_FOLLOWLOCATION to true or commenting it out entirely makes the error go away, but now we're getting a different page from the one expected.

I ran the following to see where I was being redirected to:

$htm = file_get_contents("");

This gave me a page saying I was using an outdated browser, and needed to update it. So apparently Facebook doesn't like the User Agent.

I updated it as follows:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/44.0.2');

That appears to solve the problem.