qwert-e qwert-e - 1 year ago 78
Python Question

Union of node and function on node in XPath

I am using Scrapy to crawl some webpages. I want to write an XPath query that will, within a parent

, append a couple of characters of text to any child
nodes, while extracting the text of the div's
node normally. Essentially it is like a normal
query, just written with
and calling the
function on the descendants (which, if they exist, will be

These all return a value:

  1. my_div.xpath('div[@class="my_class"]/text()).extract()

  2. my_div.xpath('concat(\'@\', div[@class="my_class"]/a/text())').extract()

  3. my_div.xpath('div[@class="my_class"]/text() | div[@class="my_class"]/a/text()').extract()

However attempting to combine (1) and (2) above in the format of (3):

my_div.xpath('div[@class="my_class"]/text() |
concat(\'@\', div[@class="my_class"]/a/text())').extract()

results in the following error:

ValueError: XPath error: Invalid type in div[@class="my_class"]/text() | concat('@', div[@class="my_class"]/a/text())

How do I get XPath to recognize the union of a node with a function called on a node?

Answer Source

I think it doesn't work because concat is doesn't actually return a path, and | is used to select multiple paths

By using the | operator in an XPath expression you can select several paths.

as per http://www.w3schools.com/xsl/xpath_syntax.asp

Why not just split it into two? Generally you use ItemLoaders with your spider. So you can simply add as many paths and/or values as you like.

mil = MyItemLoader(response=response)
mil.add_xpath('name', 'xpath1')
mil.add_xpath('name', 'xpath2')
# {'name': ['values_of_xpath1','values_of_xpath2']

If you want to preserve tree order you can try:

nodes = my_div.xpath('div[@class="my_class"]')
text = []
for node in nodes:
text = '@'.join(text)

You can probably simplify it with list comprehension but you get the idea: extract the nodes and iterate through nodes for both values.