Avery Lipsit Avery Lipsit - 3 months ago 18
Python Question

Trying to get lxml to print a specific number in python

I'm trying to get lxml to print the selected in python:
http://imgur.com/a/joeql

The code I have isn't much but here it is

from lxml import html
import requests


page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)

winner = tree.xpath(//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7])

print,winner

Answer

The syntax error you see is because you have not enclosed the XPath string into quotes, fix it:

winner = tree.xpath('//*[@id="eventView0"]/div[3]/table/tbody/tr[1]/td[7]')

The actual problem is that the table content is dynamically formed via JavaScript that is executed in the browser. What you can do is to parse the script tag that has the desired data inside the JSON object, extract the JSON string and load it into the Python data structure via json.loads():

import json
import re

from lxml import html
import requests


page = requests.get('https://www.pathofexile.com/forum/view-thread/1703834')
tree = html.fromstring(page.content)

script = tree.xpath('//script[contains(., "var json")]/text()')[0]
obj_string = re.search(r"var json = (\{.*?\}),$", script, re.MULTILINE).group(1)
obj = json.loads(obj_string)

# print entries
entries = obj['ladder']['entries']
for entry in entries:
    print(entry['account']['name'])

Prints account names (just as a proof it is working):

Havoc6
Steelmage
Olecgolec
...
Anafobia
nokieka2
HoGji