eaglefreeman eaglefreeman - 10 months ago 81
HTML Question

extracting text xpath scrapy

Hi all I would like to extract all the text from an html block using xpath in scrapy

Let's say we have a block like this:


I want to extract the text as ["Blahblah","Bluhbluh","Blihblih"]. I want xpath to recursively look for text in the div node.
I have heard tried:
but it does not extract nested elements.


Answer Source

You can use XPath's string() function on each p element:

>>> import scrapy
>>> selector = scrapy.Selector(text="""<div>
...    <p>Blahblah</p>
...    <p><a>Bluhbluh</a></p>
...    <p><a><span>Bliblih</span></a></p> 
... </div>""")
>>> [p.xpath("string()").extract() for p in selector.xpath('//div/p')]
[[u'Blahblah'], [u'Bluhbluh'], [u'Bliblih']]
>>> import operator
>>> map(operator.itemgetter(0), [p.xpath("string()").extract() for p in selector.xpath('//div/p')])
[u'Blahblah', u'Bluhbluh', u'Bliblih']