loremIpsum1771 loremIpsum1771 - 3 months ago 7
Bash Question

Xpath exposes text node in dev console but not in python shell

I'm writing a web scraper that is supposed to be scraping data from rows inside of an html table here. I'm able to expose all of the text inside of the rows in the table by using this xpath in firebug:

$x('.//*[@class="statistics"]/tbody/tr/th/a/text()')
. Running this shows the full set of all of the text nodes in the table.

I based this xpath on another similar xpath that I had previously used for this site which also returns all of the desired text nodes:
'.//*[@class="productionsEvent"]/text()'
. For some reason, when I try to print the text from the rows of the statistics table inside of the python shell after having simply requested the html, I get an empty list. What might the xpath not be working in the shell?

Answer

This is because of the tbody - it is inserted by the browser and you would not get it when download the page via urllib2 or requests:

>>> import requests
>>> from lxml.html import fromstring
>>> 
>>> url = "https://www.federalreserve.gov/releases/h10/hist/"
>>> response = requests.get(url)
>>> root = fromstring(response.content)
>>> root.xpath('.//*[@class="statistics"]/tbody/tr/th/a/text()')  # with tbody
[]
>>> root.xpath('.//*[@class="statistics"]//tr/th/a/text()')  # without tbody
['Australia', 'Brazil', 'Canada', 'China, P.R.', 'Denmark', 'EMU member countries', 'Greece', 'Hong Kong', 'India', 'Japan', 'Malaysia', 'Mexico', 'New Zealand', 'Norway', 'Singapore', 'South Africa', 'South Korea', '\r\n        ', 'Sri Lanka', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'United Kingdom', 'Venezuela']
Comments