Catalin Besleaga Catalin Besleaga - 7 months ago 23
HTML Question

Python2 Scrape html with xpath

Consider a html page with 3 tables in it.

I want to loop through each table and at the same time to print something along if the content coresponds to something I want.

I need to keep track of the table I'm at.

As you see in the code below I have the

page
variable which is a html string.

I can return the content in all the tables at once(in an array).

I'd like to loop through them.

import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup

page = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>cv</title>
</head>
<body>

<table>
<tr>
<td>table1 td1</td>
<td>table1 td2</td>
</tr>
</table>

<table>
<tr>
<td>table2 td1</td>
<td>table2 td2</td>
</tr>
</table>

<table>
<tr>
<td>table3 td1</td>
<td>table3 td2</td>
</tr>
</table>

</body>
</html>
"""

soup = str(BeautifulSoup(page, 'html.parser'))

tree = html.fromstring(soup)

tds = tree.xpath('//table/tr/td/text()')

for td in tds:
print(td + '\n')

print('Ready !!')

Answer

You mean you need to process each table on its own?

for table in tree.xpath(".//table"):
    print("---  new table: ---")
    for td in table.xpath(".//td"):
        print(td)