I've been working on scraping this site using selenium and scrapy. I want my code to click on each company link and follow then extract and loop this process. but I can't figure out how to go from one company link to another.
Any help would be appreciated.
from scrapy.http import TextResponse
from selenium import webdriver
name = 'comp'
allowed_domains = ['site']
start_urls = ["site"]
def __init__(self, **kwargs):
self.driver = webdriver.Firefox()
def parse(self, response):
index = 0
companies = self.driver.find_elements_by_xpath('//*[@id="company-list"]/ul/li')
resp = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
for com in resp.xpath('body'):
# DO Something
index += 1
As already suggested, try to use their API, you won't have to bother with page rendering, clicking elements etc. Looking on XHR request in developer tools, you can see that:
pageparameter in URL.
records[X].uri, for example for the first company CombaGroup it's https://www.investiere.ch/api2/v1/companies/10211.