Hmd88 Hmd88 - 22 days ago 6x
Python Question

Duplicate results in Xpath and not CSS selectors in scrapy

So I am playing around with scrapy through the tutorial. I am trying to scrape the text, author and tags of each quote in the companion website
when using CSS selectors like mentioned there:

for quote in response.css('div.quote'):
print quote.css('span.text::text').extract()
print quote.css('span small::text').extract()
print quote.css('div.tags a.tag::text').extract()

I get the desired result (i.e: print of each text, author and quotes once).
But once using Xpath selectors like this:

for quote in response.xpath("//*[@class='quote']"):
print quote.xpath("//*[@class='text']/text()").extract()
print quote.xpath("//*[@class='author']/text()").extract()
print quote.xpath("//*[@class='tag']/text()").extract()

I get duplicates results!

I still can't find why there is such a difference between the 2.


Try .// instead of // for your relative searches e.g.

print quote.xpath(".//*[@class='text']/text()").extract()

When you use //, although you're searching from quote, it takes this to mean an absolute search so its context is still the root of the document. .// however, means to search from . - the current node - and the context of this search will be limited to the elements nested under quote.

As a side note, if you're looking to get the exact same results, you should consider changing * to the tags you used in the CSS search - span or div. In this case it doesn't make any difference but just a head's up for future reference.