Nawaf Gantare Nawaf Gantare - 10 months ago 44
PHP Question

PHP PregMatch Error with spaces on extract

I want to extract data from a web source but i am getting error in preg match


preg_match("("instapp:owner_user_id" content="(.*)")", $html, $match);
$title = $match[1];

echo $title;

This is the error i get

Parse error: syntax error, unexpected 'instapp' (T_STRING) in
/home/ubuntu/workspace/test.php on line 4

Please help me how can i do this? and i also want to extract more data from the page with regex so is it possible to extract all at once using single code? or i want to use pregmatch many times?

Answer Source

The main problem is that you did not form a valid string literal. Note that PHP supports both single- and double-quoted string literals, and you may use that to your advantage:

preg_match('~"instapp:owner_user_id" content="([^"]*)"~', $html, $match);

While it is OK to use paired (...) symbols as regex delimiters, I'd suggest using a more conventional / or ~/@ symbols.

Also, (.*) is a too generic pattern that may match more than you need since . also matches " and * is a greedy modifier, a negated character class is better, ([^"]*) - 0+ chars other than ".

HOWEVER, to parse HTML in PHP, you may use a DOM parser, like DOMDocument.

Here is a sample to get all meta tags that have content attribute and extracting the value of that attribute and saving in an array:

$html = "<html><head><meta property=\"al:ios:url\" content=\"instagram://media?id=1329656989202933577\" /></head><body><span/></body></html>";
$dom = new DOMDocument('1.0', 'UTF-8');

$xpath = new DOMXPath($dom);
$metas = $xpath->query('//meta[@content]');
$res = array();
foreach($metas as $m) { 
   array_push($res, $m->getAttribute('content'));