stAMy stAMy - 2 months ago 7
PHP Question

Empty array returned when using preg_match on a website to access information

I'm trying to access some content from a weather page in my php file. The website is: http://www.weather-forecast.com/locations/Bergen/forecasts/latest and in view source, I want to be able to get the information from: "3 Day Weather Forecast Summary:" and all the required information in there.

My code is so far:

<?php

$contents = file_get_contents("http://www.weather-forecast.com/locations/Bergen/forecasts/latest");

preg_match('/3 Day Weather Forecast Summary:<\/b><span class="read-more-small"><span class="read-more-content"> <span class="phrase"> (.*?) </s', $contents, $matches);
print_r($matches);

?>


For some reason it wont give me all the information between the spans in the sourcecode. What I want to access is:

3 Day Weather Forecast Summary: Moderate rain (total 17mm), heaviest on Mon morning. Very mild (max 18°C on Wed afternoon, min 11°C on Tue night). Winds decreasing (fresh winds from the WSW on Mon morning, calm by Mon night).

like this in clean text. Any suggestions?

Regards, Bojar

Answer Source

As far as I can tell, your regex should not include spaces around the wildcard match, because the website source doesn't have any spaces before and after the 3 day summary. Try:

'... <span class="phrase">(.*?)</s'

Full call:

preg_match(
    '/3 Day Weather Forecast Summary:<\/b><span class="read-more-small"><span class="read-more-content"> <span class="phrase">(.*?)</s',
    $contents,
    $matches
  );

Edit: Just confirmed that the pattern without spaces produces the expected result.

Additionally, please be careful about using this sort of parsing for anything long-term or anything outside of your personal hobby projects. It is extremely prone to breaking down after even the most minor of changes (it depends on whitespace!) in HTML structure, CSS classes, etc etc. For something more reliable, consider using an HTML parser with CSS selectors, such that you can look for e.g. span.phrase in the document. While this is still not perfect, it is more stable than a preg_match.