haben haben - 5 months ago 25
Python Question

Scrapy & Selenium: How To Loop XPATH and preform a click

I've been working on scraping this site using selenium and scrapy. I want my code to click on each company link and follow then extract and loop this process. but I can't figure out how to go from one company link to another.

Any help would be appreciated.

from scrapy.http import TextResponse
from selenium import webdriver
import scrapy
import time

class ExampleSpider(scrapy.Spider):
name = 'comp'
allowed_domains = ['site']
start_urls = ["site"]

def __init__(self, **kwargs):
super(ExampleSpider, self).__init__(**kwargs)
self.driver = webdriver.Firefox()

def parse(self, response):
index = 0
while True:
companies = self.driver.find_elements_by_xpath('//*[@id="company-list"]/ul/li')
except IndexError:
resp = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
for com in resp.xpath('body'):
yield \
# DO Something

index += 1

It extracts from the first link only then it stops. Please Help Me.

Answer Source

As already suggested, try to use their API, you won't have to bother with page rendering, clicking elements etc. Looking on XHR request in developer tools, you can see that:

  1. To get the list of companies, call https://www.investiere.ch/proxy/api2/v1/companies?extra%5Bimagecache%5D=company_logo_70&fields=companyType,lifecycle&page=0&parameters%5Binclude_skipped%5D=yes. Clicking Load more... just adjusts the page parameter in URL.
  2. From the result above, you can extract company details by following link in attribute records[X].uri, for example for the first company CombaGroup it's https://www.investiere.ch/api2/v1/companies/10211.
  3. To get the list of people (e.g. Managers), follow link https://www.investiere.ch/proxy/api2/v1/companies/10211/people.