gdogg371 gdogg371 - 19 days ago 5
Python Question

Scrapy Spider not scraping correctly

I am using Python.org 2.7 64 bit shell on Windows Vista. I have Scrapy installed and it seems to be stable and working. However, I have copied the following simple piece of code:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/sfc/npo/"]

def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//p")
for titles in titles:
title = titles.select("a/text()").xpath()
link = titles.select("a/@href").xpath()
print title, link


Contained in this Youtube video:

http://www.youtube.com/watch?v=1EFnX1UkXVU
When I run this code I get the warning:

hxs = HtmlXPathSelector(response)
C:\Python27\mrscrap\mrscrap\spiders\test.py:11: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
titles = hxs.select("//p")
c:\Python27\lib\site-packages\scrapy\selector\unified.py:106: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, ins
.Selector instead.
for x in result]
C:\Python27\mrscrap\mrscrap\spiders\test.py:13: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
title = titles.select("a/text()").extract()
C:\Python27\mrscrap\mrscrap\spiders\test.py:14: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
link = titles.select("a/@href").extract()


Has some of the syntax of Scrapy changed recently so that .extract() is no longer valid? I've tried using .xpath() instead, but this throws up an error saying that two arguments are required for .xpath(), but I'm not sure what to use there.

Any ideas?

Thanks

Answer

In reference to the other answer, it should be

title = titles.xpath("a/text()").extract()
Comments