Python Question

Why do I see only one div element being returned?

Say, I go to this page

Save, its source code. Then use

in the following way:

tree = etree.parse('redditsample.html', parser=etree.HTMLParser());

And this is what I get:

[<Element div at 0x7f185ac9f908>]

Why do I get only one element? If one looks into source code, he will see there are way more div elements available. Why aren't they parsed?


Answer Source

Check if the redditsample.html file you saved is the same as

reddit enforces rate limiting, so if you’ve run your script multiple times, you may have it that. In that case your saved redditsample.html file might only contain a reddit message saying you hit their rate limiting and you need to try your request again later.

Requesting that URL without hitting any rate limiting, the .xpath('//div') result is 429 nodes:

>>> len(etree.fromstring(requests.get('')
    .content, parser=etree.HTMLParser()).xpath('//div'))
