Shanky Shanky - 2 years ago 172
Python Question

Get all td content inside tbody of tr in python using lxml

I am getting values of header of html table below using lxml but when I am trying to parse the contents of the td's inside tr which is in tbody using xpath its giving me empty value because the data is generated dynamically.
Below is my python code with its output value I am getting.
How can I get the values?

<table id="datatabl" class="display compact cell-border dataTable no-footer" role="grid" aria-describedby="datatabl_info">
<tr role="row">
<th class="dweek sorting_desc" tabindex="0" aria-controls="datatabl" rowspan="1" colspan="1" style="width: 106px;" aria-label="Week: activate to sort column ascending" aria-sort="descending">Week</th>
<th class="dnone sorting" tabindex="0" aria-controls="datatabl" rowspan="1" colspan="1" style="width: 100px;" aria-label="None: activate to sort column ascending">None</th>

<tr class="odd" role="row">
<td class="sorting_1">2016-05-03</td>
<tr class="even" role="row">
<td class="sorting_1">2016-04-26</td>

My Python code

from lxml import etree
import urllib

web = urllib.urlopen("")
s =

html = etree.HTML(s)

## Get all 'tr'
tr_nodes = html.xpath('//table[@id="datatabl"]/thead')
print tr_nodes

## 'th' is inside first 'tr'
header = [i[0].text for i in tr_nodes[0].xpath("tr")]
print header

## tbody
tr_nodes_content = html.xpath('//table[@id="datatabl"]/tbody')
print tr_nodes_content

td_content = [[td[0].text for td in tr.xpath('td')] for tr in tr_nodes_content[0]]
print td_content

output in terminal:

[<Element thead at 0xb6b250ac>]
[<Element tbody at 0xb6ad20cc>]

Answer Source

This will get the data from the ajax request in json format:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36',
    'Content-Type': 'application/json',
    'Referer': '',
    'X-Requested-With': 'XMLHttpRequest',
import json
data = json.dumps({'area':'conus', 'type':'conus', 'statstype':'1'})   

ajax ="",
from pprint import pprint as pp


A snippet of the output:

{u'd': [{u'D0': 33.89,
         u'D1': 14.56,
         u'D2': 5.46,
         u'D3': 3.44,
         u'D4': 1.11,
         u'Date': u'2016-05-03',
         u'FileDate': u'20160503',
         u'None': 66.11,
         u'ReleaseID': 890,
         u'__type': u'DroughtMonitorData.DmData'},
        {u'D0': 39.64,
         u'D1': 15.38,
         u'D2': 5.89,
         u'D3': 3.44,
         u'D4': 1.11,
         u'Date': u'2016-04-26',
         u'FileDate': u'20160426',
         u'None': 60.36,
         u'ReleaseID': 889,
         u'__type': u'DroughtMonitorData.DmData'},
        {u'D0': 39.28,
         u'D1': 15.44,
         u'D2': 5.94,
         u'D3': 3.44,
         u'D4': 1.11,
         u'Date': u'2016-04-19',
         u'FileDate': u'20160419',
         u'None': 60.72,
         u'ReleaseID': 888,
         u'__type': u'DroughtMonitorData.DmData'},
        {u'D0': 39.2,
         u'D1': 17.75,
         u'D2': 6.1,
         u'D3': 3.76,
         u'D4': 1.71,
         u'Date': u'2016-04-12',
         u'FileDate': u'20160412',
         u'None': 60.8,
         u'ReleaseID': 887,
         u'__type': u'DroughtMonitorData.DmData'},
        {u'D0': 37.86,
         u'D1': 16.71,
         u'D2': 5.95,
         u'D3': 3.76,
         u'D4': 1.71,
         u'Date': u'2016-04-05',
         u'FileDate': u'20160405',
         u'None': 62.14,
         u'ReleaseID': 886,
         u'__type': u'DroughtMonitorData.DmData'},

You can get all the data you want from the json returned, if you print(len(cont.json()["d"])) you will see you get 853 rows returned so you actually seem yo get all the data from the 35 pages in one go. Even if you did parse the page you would still have to do it 34 more times, getting the json from the ajax request makes it easy to parse and all from a single post.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download