Borealis Borealis -4 years ago 125
Python Question

How to extract a single element from webpage?

I am looking to extract a single value as text from the following webpage.

Cascade River Rustic Campground

Specifically, I'm after the "4" value after the "No. of Sites" text (see screenshot)

enter image description here

I've been able to isolate the xpath using Chrome, which is as follows:

//*[@id="act_1"]/div[1]/table/tbody/tr/td[2]


The following code yields an empty list:

import urllib2
from lxml import etree

url = "https://www.fs.usda.gov/recarea/superior/recreation/camping-cabins/recarea/?recid=36913&actid=29"

response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
x = tree.xpath('//*[@id="act_1"]/div[1]/table/tbody/tr/td[2]')
print x


The expected output should be:

>>> print x
['4']


How can I extract a single element (i.e. "4") in a web page?

Answer Source

It seems this xpath works for me (note there's no tbody) and use text() to extract the text from a node:

x = tree.xpath('//*[@id="act_1"]/div[1]/table/tr/td[2]/text()')

print(x[0].strip())
# 4
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download