Lucas Mähn Lucas Mähn - 29 days ago 8
Python Question

import strings into scrapy to use as crawl urls

So my question is how do I tell scrapy to crawl URLs, which only set apart by one string. So for example: https://www.youtube.com/watch?v=STRING
I got the strings saved in a txt file.

with open("plz_nummer.txt") as f:
cityZIP = f.read().rsplit('\n')

for a in xrange(0,len(cityZIP)):

next_url = 'http://www.firmenfinden.de/?txtPLZ=' + cityZIP[a] + '&txtBranche=&txtKunden='
pass

sal sal
Answer

I would make the loading of the file with zip codes part of the start_requests method as a generator. Something in the lines of:

import scrapy

class ZipSpider(scrapy.Spider):
    name = "zipCodes"
    self.city_zip_list = []

    def start_requests(self):
        with open("plz_nummer.txt") as f:
            self.city_zip_list = f.read().rsplit('\n')
        for city_zip in self.city_zip_list:
            url = 'http://www.firmenfinden.de/?txtPLZ={}&txtBranche=&txtKunden='.format(city_zip)
            yield scrapy.Request(url=url, callback=self.parse)  

    def parse(self, response):
        # Anything else you need
        # to do in here
        pass 

This should give you a good starting point. Also read this article: https://doc.scrapy.org/en/1.1/intro/tutorial.html