meowsqueak meowsqueak - 5 months ago 11
Python Question

Requests - determine parameterised url prior to issuing request, for inclusion in Referer header

I am writing a Python 2.7 script using Requests to automate access to a particular website. The website has a requirement that a Referer header matching the request URL is provided, for "security reasons". The URL is built up from a number of items in a

params
dict, passed to requests.post().

Is there a way to determine what the URL that Requests will use is, prior to making the request, so that the Referer header can be set to this correct value? Let's assume that I have a lot of parameters:

params = { 'param1' : value1, 'param2' : value2, # ... etc
}

base_url = "http://example.com"
headers = { 'Referer' : url } # but what is 'url' to be?
requests.post(base_url, params=params, headers=headers) # fails as Referer does not match final url


I suppose one workaround is to issue the request and see what the URL is, after the fact. However there are two problems with this - 1. it adds significant overhead to the execution time of the script, as there will be a lot of such requests, and 2. it's not actually a useful workaround because the server actually redirects the request to another URL, so reading it afterwards doesn't give the correct Referer value.

I'd like to note that I have this script working with urllib/urllib2, and I am attempting to write it with Requests to see whether it is possible and perhaps simpler. It's not a complicated process the script has to follow, but it may perhaps be slightly beyond the scope of Requests. That's fine, I'd just like to confirm that this is the case.

Answer

I think I found a solution, based on Prepared Requests. The concept is that Session.prepare_request() will do everything to prepare the request except send it, which allows my script to then read the prepared request's url, which now includes the parameters whose order are determined by the dict order. Then it can set the Referer header appropriately and then issue the original request.

params = {'param1' : value1, 'param2' : value2, # ... etc
         }
url = "http://example.com"

# Referer must be correct
# To determine correct Referer url, prepare a request without actually sending it
req = requests.Request('POST', url, params=params)
prepped = session.prepare_request(req)
#r = session.send(prepped)   # don't actually send it

# add the Referer header by examining the prepared url
headers = { 'Referer': prepped.url }

# now send normally
r = session.post(url, params=params, data=data, headers=headers)
Comments