wj127 wj127 - 11 days ago 8
Python Question

how to add non-ascii characters in Xpath, in Scrappy

I have the following Xpath:

bathroom = response.xpath(“.//div[1][contains(., 'Baños’)]/text()").extract_first()


And I get this error:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters


I've tried the solutions given in these other similar questions:

Filtering out certain bytes in python

Scrapy xpath utf-8 literals

but none has resolved my problem!

Note: with the solution of the first link, I obviously replaced the 'input_string' by let's say
word = "baños"
, and I got an error like "the function has one argument, 2 given..."

Can anyone help?

Answer

Besides the literal Baños, your code snippet contains invalid literal string delimiter (both single and double quotes) which will cause a different error :

bathroom = response.xpath(“.//div[1][contains(., 'Baños’)]/text()").extract_first()
                          ^                            ^

Converting the entire XPath expression to unicode, as suggested in the 2nd link, and fixing the two quotes pointed above should fix the initial errors. Below as a quick test using lxml (which scrapy uses under the hood) :

>>> from lxml import etree
>>> root = etree.fromstring('<root/>')
>>> root.xpath(u".//div[1][contains(., 'Baños')]/text()")
[]