user3185563 user3185563 - 3 years ago 259
Python Question

Using normalize-space with Scrapy

Below is a mock-up of a document I'm working on:

<div>
<h4>Area</h4>
<span class="aclass"> </span>
<span class="bclass">
<strong>Address:</strong>
10 Downing Street

London

SW1
</span>
</div>


I'm getting the address like this:

response.xpath(u".//h4[. = 'Area']/following-sibling::span[contains(.,'Address:')]/text()").extract()


which returns

[u'\r\n \t', u'\r\n 10 Downing Street\r\n\r\n London \r\n \r\n SW1\r\n ']


I'm trying to clean that up with normalize-space. I've tried putting it in every location I could think of, but it either tells me there's a syntax error, or returns an empty string.

Updating to add that I'm trying to get this working without changing the selector too much. I have similar cases which don't have the
<strong>
tag, for example. The selector is overcomplicated in the example I've prepared here, but in the live version, I have to take that rather convoluted route to get to the address.

Regarding the possible duplicate Following the advice in the possible duplicate, I added
/normalize-space(.)
giving this:

(u".//h4[. = 'Area']/following-sibling::span[contains(.,'Address:')]/text()/normalize-space(.)").extract()


That produces a
ValueError: Invalid XPath:
error.

Answer Source
"normalize-space(//strong[contains(text(), 'Address:')]/following-sibling::node())"
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download