Micky S Micky S - 1 year ago 203
Python Question

JSON in post request works in HttpRequester but not in python Requests

I'm stuck in web scraping a page using Python. Basically, the following is the request from HttpRequester (in Mozilla) and it gives me the right response.

POST https://www.hpe.com/h20195/v2/Library.aspx/LoadMore
Content-Type: application/json
{"sort": "csdisplayorder", "hdnOffset": "1", "uniqueRequestId": "d6da6a30bdeb4d77b0e607a6b688de1e", "test": "", "titleSearch": "false", "facets": "wildcatsearchcategory#HPE,cshierarchycategory#No,csdocumenttype#41,csproducttype#18964"}
-- response --
200 OK
Cache-Control: private, max-age=0
Content-Length: 13701
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Sat, 28 May 2016 04:12:57 GMT
Connection: keep-alive

The exact same operation in python 2.7.1 using Requests, fails with an error. The following is the code snippet:

jsonContent = {"sort": "csdisplayorder", "hdnOffset": "1", "uniqueRequestId": "d6da6a30bdeb4d77b0e607a6b688de1e", "test": "", "titleSearch": "false", "facets": "wildcatsearchcategory#HPE,cshierarchycategory#No,csdocumenttype#41,csproducttype#18964"}

catResponse = requests.post('https://www.hpe.com/h20195/v2/Library.aspx/LoadMore', json = jsonContent)

The following is the error that I get:

{"Message":"Value cannot be null.\r\nParameter name: source","StackTrace":" at
System.Linq.Enumerable.Contains[TSource](IEnumerable`1 source, TSource value, I
EqualityComparer`1 comparer)\r\n

More information:
The Post request that I'm looking for is fired upon:

  1. opening this web page: https://www.hpe.com/h20195/v2/Library.aspx?doctype=41&doccompany=HPE&footer=41&filter_doctype=no&filter_doclang=no&country=&filter_country=no&cc=us&lc=en&status=A&filter_status=rw#doctype-41&doccompany-HPE&prodtype_oid-18964&status-a&sortorder-csdisplayorder&teasers-off&isRetired-false&isRHParentNode-false&titleCheck-false

  2. Clicking on the "Load more" grey button at the end of the page

I'm capturing the exact set of request headers and response from the browser operation and trying to mimic that in Postman, Python code and HttpRequester (Mozilla).

It flags the same error (mentioned above) with Postman and Python, but works with no headers set on my part with HttpRequester.

Can anyone think of an explanation for this?

Answer Source

If both Postman and requests are receiving an error, then there is more context than what HttpRequester is showing. There are a number of headers that I'd expect to be set almost always, including User-Agent and Content-Length, that are missing here.

The usual suspects are cookies (look for Set-Cookie headers in earlier requests, preserve those by using a requests.Session() object), the User-Agent header and perhaps a Referrer header, but do look for other headers like anything starting with Accept, for example.

Have HttpRequester post to http://httpbin.org/post instead for example, and inspect the returned JSON, which tells you what headers were sent. This won't include cookies (those are domain-specific), but anything else could potentially be something the server looks for. Try such headers one by one if cookies are not helping.