Winklevoss333 Winklevoss333 - 4 months ago 57
Python Question

Python - Global Name is not defined

I promise I have read through the other versions of this question, but I was unable to find a relevant one to my situation. If there is one, I apologize, I've been staring at this for a few hours now.

I've been toying with this a lot and actually got results on one version, so know it's close.

The 'start_URLs' variable is defined as a list prior to the function, but for some reason doesn't register on the global/module level.

Here is the exact error: for listing_url_list in start_urls:
NameError: global name 'start_urls' is not defined

import time
import scrapy
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.spiders import CrawlSpider, Rule
from scraper1.items import scraper1Item

from scraper1 import csvmodule

absolute_pos = './/*[@id="xpath"]/td/@class'

class spider1(CrawlSpider):
name = 'ugh'
allowed_domains = ["ugh.com"]
start_urls = [
"http://www.website.link.1",
"http://www.website.link.2",
"http://www.website.link.3"
]

def parse(self, response):
Select = Selector(response)
listing_url_list = Select.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
for listing_url_list in start_urls:
yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

def parselisting(self, response):
ResultsDict = scraper1Item()
Select = Selector(response)
ResultsDict['absolute_pos'] = Select.xpath(absolute_pos).extract()
ResultsDict['listing_url'] = response.url
return ResultsDict

Answer

You need to fix your start_requests() method:

  • you meant to use listing_url_list instead of start_urls
  • you meant to use listing_url instead of listing_url_list as a loop variable
  • there is no need to instantiate Selector - use response.xpath() shortcut directly

Fixed version:

def parse(self, response):
    listing_url_list = response.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
    for listing_url in listing_url_list:
        yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

As a side note, I think you don't need CrawlSpider and can actually use a regular scrapy.Spider, since you are not actually using rules with link extractors.