S. T. Abdulrasaq S. T. Abdulrasaq - 3 months ago 26
JSON Question

Regex: Extract Tweet Username and ID From URL

I'm trying to fetch the tweet URL, if found, in a message with this regex

#^https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)$#is


But it seems my regex is not correct to fetch for the tweet URL. Below is my full code

function gettweet($string)
{
$regex = '#^https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)$#is';
$string = preg_replace_callback($regex, function($matches) {
$user = $matches[2];
$statusid = $matches[3];
$url = "https://twitter.com/$user/status/$statusid";
$urlen = urlencode($url);
$getcon = file_get_contents("https://publish.twitter.com/oembed?url=$urlen");
$con = json_decode($getcon, true);
$tweet_html = $con["html"];
return $tweet_html;
}, $string);
return $string;
}

$message="This is absolutely trending can you also see it here https://twitter.com/itslifeme/status/765268556133064704 i like it";
$mes=gettweet($message);
echo $mes;

Answer

The reason this wouldn't work as you expect is because you're including the anchors in your regular expression, which denote that the pattern must match from beginning to end.

By removing the anchors, it matches...

$regex  = '#https?://twitter\.com/(?:\#!/)?(\w+)/status(es)?/(\d+)#is';
$string = "This is absolutely trending can you also see it here https://twitter.com/itslifeme/status/765268556133064704 i like it";

if (preg_match($regex, $string, $match)) {
    var_dump($match);
}

The above code gives us...

array(4) {
  [0]=>
  string(55) "https://twitter.com/itslifeme/status/765268556133064704"
  [1]=>
  string(9) "itslifeme"
  [2]=>
  string(0) ""
  [3]=>
  string(18) "765268556133064704"
}

Also, there's really no reason to include the dot all pattern modifier in your expression.

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.