Philip McQuitty Philip McQuitty - 5 days ago 6
HTML Question

Parse a web page and subpages via PHP

On the link below, I want to go into each subpage and parse the HTML table into a single .html file. Also, for example, if I clicked on the Accountancy subpage, that subpage has multiple pages of class listings (page 1, 2, 3, etc). I want to parse all the pages of the subpages as well.

Here is the parent page: http://my.gwu.edu/mod/pws/subjects.cfm?campId=1&termId=201401

Do I need to use web crawlers? What would be the best way to do this to compile all subpages in ONE .html file? How could I write my code to efficiently scrape all the html table data from all of the subpages listed? Cheers!

Answer

You could use ultimate-web-scraper to get the page. Then go through all links you find, like below, please check the docs for the complete thing.

$html->load($result["body"]);
$rows = $html->find("a[href]");
foreach ($rows as $row)
{
  //get the page at $row->href, and so on recursevly
}

Though if you do like this, make sure to keep track of the links you visited otherwise you might end up in an infinit loop.

Just a side note, this might not be a that good solution if there is a couple of hundreds of pages since it will be slow.

Comments