Abdul Basit Abdul Basit - 1 month ago 12
PHP Question

Delay Between WebScraping

I am using webscraping to get data from a certain website using SIMPLE PHP DOM PARSER CLASS
There are few problems i am facing.


  1. There are two websites which are returning an error HTTP 403 forbidden

  2. Secondly As per the below code i am scraping 9 products from 9 URLS after 8 URL's i get error i shuffle the urls and checked single one but its not the url its the execution time or web requests may be allowed as i get Appache windows error .I tried to delay it using
    sleep(10);
    it didnt worked any help would be highly appreciated

    $url = $this->urls['abc'].'Product/1/1_oz_Gold_American_Eagle___Random_Year.aspx';
    $regex = 'span[id=ctl10_ctl00_tc1_TabPnlProdDesc_lblbuyprice]';
    $data=$this->getCoinVal($url,$regex);

    $this->update_scrap(GAE_1,APMEX,strip_tags($this->r_dollar($data)),$url);


Answer

Well the error was due to memory leaks in apache server

so using these two lines will make it work .

              $dom->clear();  
              unset($dom);

where $dom is object of parser class