John John - 1 month ago 6
Python Question

Python - Read table

In python using the lxml library how do i read an html tables td values? I tried reading the xpath table but i cant find the right parameter to returns the td values. Thanks everyone, i appreciate it.

import sys
from glob import *
from lxml import etree, html
import requests
#Scan directory (current) and scrape the html files
dirScan = glob('html/*.*')
fileCount = 0
while(fileCount < len(dirScan)):
fileName = dirScan[fileCount]
page = open(fileName)
tree = html.fromstring(page.read())
tables = tree.xpath('//table')
print("Tables:",tables)


page.html

<table style="width:100%">
<tr>
<th>Id</th>
<th>Name</th>
<th>Age</th>
</tr>
<tr>
<td>1</td>
<td>Smith</td>
<td>50</td>
</tr>
<tr>
<td>2</td>
<td>Jackson</td>
<td>94</td>
</tr>
<tr>
<td>3</td>
<td>Miller</td>
<td>43</td>
</tr>
</table>

Answer

code

 >>> page="""<table style="width:100%">
      <tr>
        <th>Id</th>
        <th>Name</th>
        <th>Age</th>
      </tr>
      <tr>
        <td>1</td>
        <td>Smith</td>
        <td>50</td>
      </tr>
      <tr>
        <td>2</td>
        <td>Jackson</td>
        <td>94</td>
      </tr>
      <tr>
        <td>3</td>
        <td>Miller</td>
        <td>43</td>
      </tr>
    </table> """
    >>> tree=html.fromstring(s)
    >>> tree.xpath('//tr/td//text()')

output:

['1', 'Smith', '50', '2', 'Jackson', '94', '3', 'Miller', '43']