Difender Difender - 7 months ago 16
Python Question

Xpath parsing the whole page when i specify not to

I'm parsing websites using python and XPath.

What I'm trying to do is to extract the href from the

<a>


So here's how is the XML (page):

<div id="post">
<div align="center">
<table>
<tbody>
<tr>
<td>
<td>
<a href="test01">
<tr>
<td>
<tr>
<td>
<div align="center">
<table>
<tbody>
<tr>
<td>
<td>
<a href="test01">
<tr>
<td>
<tr>
<td>


And here's the code I did:

posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts :
print post.xpath("//table/tr[1]/td[2]/a/@href")


But the problem is that I end up with every href of
posts
and not the single one from
post


What am I doing wrong ?

Answer

An XPath starting with a / character means that it will be begin at the document root node. To create a relative XPath from the context node, you need to put a . before the /.

So your code should be:

posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts:
  print post.xpath(".//table/tr[1]/td[2]/a/@href")