Agus Sanjaya Agus Sanjaya - 1 month ago 16
Python Question

Using XPath Following to get element from XML

I have an XML like the following

<li class="expandSubItem">
<span class="expandSubLink">Popular Neighborhoods</span>
<ul class="secondSubNav" style="top:-0.125em;">
<li class="subItem">
<a class="subLink" href="/Hotels-g187147-zfn7236765-Paris_Ile_de_France-Hotels.html">Quartier Latin Hotels</a>
</li>
</ul>
</li>
<li class="expandSubItem">
<span class="expandSubLink">Popular Paris Categories</span>
<ul class="secondSubNav" style="top:-0.125em;">
<li class="subItem">
<a class="subLink" href="/HotelsList-Paris-Cheap-Hotels-zfp10420.html">Paris Cheap Hotels</a>
</li>
</ul>
</li>


I want to get all links under "Popular Paris Categories". I used something like this "//li//a/@href/following::span[text()='Popular Singapore Categories']", but it gave no results. Any idea how to get the correct result? Here is the snippet of the python code that I wrote.

t_url = 'https://www.tripadvisor.com/Tourism-g187147-Paris_Ile_de_France-Vacations.html'
page = requests.get(t_url, timeout=30)
tree = html.fromstring(page.content)

links = tree.xpath('//li[span="Popular Paris Categories"]//a/@href')
print links

Answer

This is one possible way :

//li[normalize-space(span)="Popular Paris Categories"]//a/@href

Notice how normalize-space() is used to remove trailing space from the span content. This is the reason why the XPath I suggested initially in the comment didn't work for your actual HTML.