I am looking to extract a single value as text from the following webpage.
Cascade River Rustic Campground
Specifically, I'm after the "4" value after the "No. of Sites" text (see screenshot)
I've been able to isolate the xpath using Chrome, which is as follows:
from lxml import etree
url = "https://www.fs.usda.gov/recarea/superior/recreation/camping-cabins/recarea/?recid=36913&actid=29"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
x = tree.xpath('//*[@id="act_1"]/div/table/tbody/tr/td')
>>> print x
It seems this xpath works for me (note there's no tbody) and use
text() to extract the text from a node:
x = tree.xpath('//*[@id="act_1"]/div/table/tr/td/text()') print(x.strip()) # 4