xie xie - 1 month ago 6
CSS Question

How to use the Xpath and CSS selector in for function

I am a rookie, and want use the scrapy framework to grab something, but I have trouble:

Html A:

<ul class="tip" id="tip1">
<li id="tip1_0">
<a href="http://***" title="***" target="_self">***
</a>
</li>
<li id="tip1_1">
<a href="http://***" title="***" target="_self">***
</a>
</li>
<li id="tip1_2">
<a href="http://***" title="***" target="_self">***
</a>
</li>
</ul>


I use:

f = response.xpath("//*[@id='tip1']//li/a/@href | //*[@id='tip1']//li/a/@title").extract()


When I get the f is a list, and i will change the list(f) to dict(name0=f[0], value0=f[1], name1=f[2], value1=[f3], and so on). Is any way to more easy?

Html B:

<div class="info">
<a target="_blank" href="***" title="***">
</a>
</div>
<div class="info">
<a target="_blank" href="***" title="***">
</a>
</div>
<div class="info">
<a target="_blank" href="***" title="***">
</a>
</div>


In this case:

file = response.xpath('//div[@class="info"]')
for line in file:
f = line.xpath('/a/@href').extract()
d = line.xpath('/a/@title').extract()


But, It do not work, just return 'f = []' and 'd =[]', So, i was confuse, and how can I slove this problem? Thanks a lot.

Answer

You could have made your inner expressions context-specific by prepending dots:

f = line.xpath('./a/@href').extract()
d = line.xpath('./a/@title').extract()

Or, point your outer expression to a and get the @href and @title:

file = response.xpath('//div[@class="info"]/a')
for line in file:
    f = line.xpath('@href').extract_first()
    d = line.xpath('@title').extract_first()

Also note the use of extract_first() method.