ane ane - 1 year ago 59
Javascript Question

php how to get the entire url that doesn't look at the source?

I'm trying to get all link URL of news on some

from this web

To get all link, after I view source but there is nothing.

But there are any data display

Could any that understand
help me, please?

This is my code to get the content:

$html = file_get_contents("");
if ($result === FALSE) {
echo $html;

Answer Source

To find all links in HTML you could use preg_match_all().

$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

That url uses gzip compression , so you have to detect that and decompress it using the function gzdecode(). (It must be installed in your PHP version)

The gzip compression is indicated by the Content-Encoding: gzip HTTP header. You have to check that header, so you must use curl or a similar method to retrieve the headers. (file_get_contents() will not give you the HTTP headers... it only downloads the gzip compressed content. You need to detect that it is compressed but for that you need to read the headers.)

Here is a complete example:


$url = "";

# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);

# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);

# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
    $pieces = preg_split ("/:/", $h, 2);
    $pieces2 = (count ($pieces) > 1);
    $enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
    $gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
    if ($enc && $gz)
        $gzip = 1;

# unzip content if gzipped
if ($gzip)
    $content = gzdecode ($content);

# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
    echo "$n: " . htmlspecialchars ($link) . "<br>";