Nawaf Gantare Nawaf Gantare - 12 days ago 5
PHP Question

PHP PregMatch Error with spaces on extract

I want to extract data from a web source but i am getting error in preg match

<?php

$html=file_get_contents("https://www.instagram.com/p/BJz4_yijmdJ/?taken-by=the.witty");
preg_match("("instapp:owner_user_id" content="(.*)")", $html, $match);
$title = $match[1];

echo $title;
?>


This is the error i get


Parse error: syntax error, unexpected 'instapp' (T_STRING) in
/home/ubuntu/workspace/test.php on line 4


Please help me how can i do this? and i also want to extract more data from the page with regex so is it possible to extract all at once using single code? or i want to use pregmatch many times?

Answer

The main problem is that you did not form a valid string literal. Note that PHP supports both single- and double-quoted string literals, and you may use that to your advantage:

preg_match('~"instapp:owner_user_id" content="([^"]*)"~', $html, $match);

While it is OK to use paired (...) symbols as regex delimiters, I'd suggest using a more conventional / or ~/@ symbols.

Also, (.*) is a too generic pattern that may match more than you need since . also matches " and * is a greedy modifier, a negated character class is better, ([^"]*) - 0+ chars other than ".

HOWEVER, to parse HTML in PHP, you may use a DOM parser, like DOMDocument.

Here is a sample to get all meta tags that have content attribute and extracting the value of that attribute and saving in an array:

$html = "<html><head><meta property=\"al:ios:url\" content=\"instagram://media?id=1329656989202933577\" /></head><body><span/></body></html>";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$metas = $xpath->query('//meta[@content]');
$res = array();
foreach($metas as $m) { 
   array_push($res, $m->getAttribute('content'));
}

print_r($res);