daniel daniel - 3 years ago 256
Python Question

scrapy stops after bad request

i don't know if its relevant or not, but i use the inline requests library.

if i make a request to a site's API and it returns bad request(400) the crawler just stops. how do i make it continue?

in the example below I'm checking the sell price of a book, the book ISBN is '0046594062994', and because they don't have this book it returns a bad request(you can try entering the URL below). it works just fine with the books they do have.

~~~~~~~~~

@EDIT: i found out it is a known issue with inline requests.

"Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation."

doc: https://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.httperror.HttpErrorMiddleware

i tried to do what the doc says but didn't manage to do it.
what am i doing wrong? check the added line to my example code.

example code:

response2 = yield scrapy.Request("https://api.bookscouter.com/v3/prices/sell/0046594062994.json")
response2.meta['handle_httpstatus_all'] = True
jsonresponse = loads(response2.body)

Answer Source

You need to pass the meta to the request itself using below

response2 = yield scrapy.Request("https://api.bookscouter.com/v3/prices/sell/0‌​046594062994.json", meta = {'handle_httpstatus_all' : True})


jsonresponse = loads(response2.body)

Now that you are setting handle_httpstatus_all every code like 301, 302 redirect also will be handed over to you.

So you should check

if response.status == 200:
   jsonresponse = loads(response2.body)
else:
   print("do something else")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download