I am using the scrapy shell on this page Pittsburgh Steelers at New England Patriots - September 10th, 2015 to pull individual team stats. For example, I want to pull total yards for the away team (464) which, when inspecting the element and copying the XPath yields
team_stats. The contents of the "Team Stats" table are there in the loaded website however they are commented out.
One solution would be to extract the comment which contains the team statistics and convert that comment text to HTML and extract the data found there.
The text above extracts the comments which contains your required table.
After you extract the comment you can feed it into a new selector just like Markus mentioned in his comment:
new_selector = Selector(text=extracted_text)
And on this new selector you can use again
.xpath() as you would do on the
Removing the comment delimiter is easy: you have to remove it from the beginning and from the end of the extracted text which is a string. And comments in HTML start with
<!-- and end with
-->. You need to feed the text between these characters to the new selector.
Extending the example from above:
extracted_text = response.xpath('//div[@id="all_team_stats"]//comment()').extract() new_selector = Selector(text=extracted_text[4:-3].strip()) new_selector.xpath('//*[@id="team_stats"]/tbody/tr/td').extract()