j.white j.white - 7 months ago 9
HTML Question

I keep getting HTML in the Xapth output! How do i just get text?

I keep getting HTML as well as the text I want in Xpath I am running and can't work out how to stop it as i just want the text.

The Xpath

hxs.xpath('//h1[@class="body2"]').extract()


The HTML

<div class="product-title cf">


<h1 itemprop="name" class="body2">
Cornish Ale Dozen - Case of 12
</h1>


</div>


Any suggestions would be appreciated thanks

Answer

Pure XPath instruction to get text nodes instead of the parent element would be as follow :

//h1[@class="body2"]/text()

Particularly, using the above XPath should work as you expected, assuming that the library being used to execute the XPath is Scrapy.