gr1zzly be4r gr1zzly be4r - 7 months ago 84
Python Question

Python Requests Not Returning Same Header as Browser Request/cURL

I'm looking to write a script that can automatically download

.zip
files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:

HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT


However, when calling
requests.post(url, data=params, headers=headers)
with the same information that I can see in the Chrome network inspector I am getting the following response:

>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}


It's got pretty much everything except it's missing the
Location
key that I need in order to download the
.zip
file with all of the data I want. Also the
Content-Length
value is different, but I'm not sure if that's an issue.

I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a
POST
request that yields an
HTTP
response of 302 and then has the
Location
in the response header. The second request is a
GET
request to the url specified in the
Location
value of the response header.

Should I really be sending two requests here? Why am I not getting the same response headers using
requests
as I do in the browser? FWIW I used
curl -X POST -d /*my data*/
and got back this in my terminal:

<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>


Really appreciate any help!

Answer

I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:

{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}

And then I just wrote:

res = requests.post(url, data=form_data, headers=headers)

where form_data was copied from the "Form Data" section of the Chrome console. Once I got that request, I used the zipfile and io modules to parse the content of the response stored in res. Like this:

import zipfile, io
zipfile.ZipFile(io.BytesIO(res.content))

and then the file was in the directory where I ran the Python code.

Thanks to the users who answered on this thread.