sof_dff sof_dff -4 years ago 73
Python Question

Why do I see only one div element being returned?

Say, I go to this page

https://www.reddit.com/r/starcraft/

Save, its source code. Then use

lxml
in the following way:

tree = etree.parse('redditsample.html', parser=etree.HTMLParser());
tree.xpath('//div')


And this is what I get:

[<Element div at 0x7f185ac9f908>]


Why do I get only one element? If one looks into source code, he will see there are way more div elements available. Why aren't they parsed?

Thanks.

Answer Source

Check if the redditsample.html file you saved is the same as https://www.reddit.com/r/starcraft/

reddit enforces rate limiting, so if you’ve run your script multiple times, you may have it that. In that case your saved redditsample.html file might only contain a reddit message saying you hit their rate limiting and you need to try your request again later.

Requesting that URL without hitting any rate limiting, the .xpath('//div') result is 429 nodes:

>>> len(etree.fromstring(requests.get('https://www.reddit.com/r/starcraft/')
    .content, parser=etree.HTMLParser()).xpath('//div'))
429
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download