Mohib Mohib - 2 months ago 7
Python Question

How to save crawl page link into item using scrapy?

This is my spider page:

rules = (
Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
)

def parse_item(self, response):
item = MovieNotifyItem()
item['title'] = response.xpath('//h5[@class="col s12 light center teal darken-3 white-text"]/text()').extract_first()
item['size'] = response.xpath('//*[@class="torrent-info"]//tr[1]/td[2]/text()').extract_first()
item['catagory'] = response.xpath('//*[@class="torrent-info"]//tr[2]/td[2]/text()').extract_first()
yield item


Now I want to save the page link into a item say item['page_link'] which crawled by this code:

rules = (
Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
)


How can I do that ?
Thanks in advanced

Answer

If I understand correctly, you are looking for the response.url:

def parse_item(self, response):
    item = MovieNotifyItem()
    item['url'] = response.url  # "url" field should be defined for "MovieNotifyItem" Item class
    # ...
    yield item