Lior Lior - 1 year ago 59
PHP Question

Optimize remote page retrieving and parsing

I'm retrieving a remote page with PHP, getting a few links from that page and accessing each link and parsing it.

It takes me about 12 seconds which are way too much, and I need to optimize the code somehow.

My code is something like that:

$result = get_web_page('THE_WEB_PAGE');

preg_match_all('/<a data\-a=".*" href="(.*)">/', $result['content'], $matches);

foreach ($matches[2] as $lnk) {
$result = get_web_page($lnk);

preg_match('/<span id="tests">(.*)<\/span>/', $result['content'], $match);

$re[$index]['test'] = $match[1];

preg_match('/<span id="tests2">(.*)<\/span>/', $result['content'], $match);

$re[$index]['test2'] = $match[1];

preg_match('/<span id="tests3">(.*)<\/span>/', $result['content'], $match);

$re[$index]['test3'] = $match[1];

I have some more
calls inside the loop.

How can I optimize my code?

Jan Jan
Answer Source

As others mentioned, use a parser instead (ie DOMDocument) and combine it with xpath queries. Consider the following example:


# set up some dummy data
$data = <<<DATA
    <a class='link'>Some link</a>
    <a class='link' id='otherid'>Some link 2</a>

$dom = new DOMDocument();

$xpath = new DOMXPath($dom);

# all links
$links = $xpath->query("//a[@class = 'link']");

# special id link
$special = $xpath->query("//a[@id = 'otherid']")

# and so on
$textlinks = $xpath->query("//a[startswith(text(), 'Some')]");