ane ane - 4 months ago 13
Javascript Question

php : how to get all hyperlinks from a specific div of a given page?

I'm trying to get all link URL of news on some

div
from this web

To get all link, after I view source but there is nothing.

But there are any data display

Could any that understand
PHP
,
Array()
and
JS
help me, please?

This is my code to get the content:

$html = file_get_contents("https://qc.yahoo.com/");
if ($result === FALSE) {
die("?");
}
echo $html;

Answer

To find all links in HTML you could use preg_match_all().

$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

That url https://qc.yahoo.com/ uses gzip compression , so you have to detect that and decompress it using the function gzdecode(). (It must be installed in your PHP version)

The gzip compression is indicated by the Content-Encoding: gzip HTTP header. You have to check that header, so you must use curl or a similar method to retrieve the headers. (file_get_contents() will not give you the HTTP headers... it only downloads the gzip compressed content. You need to detect that it is compressed but for that you need to read the headers.)

Here is a complete example:

<?php

$url = "https://qc.yahoo.com/";

# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);

# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);

# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
    $pieces = preg_split ("/:/", $h, 2);
    $pieces2 = (count ($pieces) > 1);
    $enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
    $gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
    if ($enc && $gz)
    {
        $gzip = 1;
        break;
    }
}

# unzip content if gzipped
if ($gzip)
{
    $content = gzdecode ($content);
}


# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
    $n++;
    echo "$n: " . htmlspecialchars ($link) . "<br>";
}