Jack Jack - 4 years ago 87
HTML Question

Scrapy uses wrong encoding, adds extra html tags to JSON from webpage

So I want to use Scrapy to get
Puerto Rico board game data

The data looks like the following

{ "data": {
"label":"<div class=\"iblock\">\u262f &ge; 75%<\/div>"

However, the response.text object in Scrapy uses a different encoding and adds some extra html tags:

{"data": {
"label": "<div class="\&quot;iblock\&quot;">\u262f ≥ 75%&lt;\/div&gt;"

As a result, when I try to parse the json into a python object:

responseJSON = json.loads(response.xpath("/html/body/text").extract_first())

I get the following error:

ValueError: end is out of bounds

How can I get Scrapy to return a correcly encoded response with no extra html tags?

Answer Source

this is json response, no need to use xpath

Tested in scrapy shell

enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download