Rakesh_K Rakesh_K - 1 month ago 14
Python Question

urllib.request.urlretrieve not downloading files over HTTPS

The below URL is the download link to download a text file.
If I paste the URL in the Firefox it downloads the actual contents i.e text file. But, when used

urlretrieve
it is giving me some html source code file.

>>> import urllib
>>> down_link='URL' #URL is a ***HTTPS*** link to download .txt file
>>> file=urllib.request.urlretrieve(down_link)


this is the output I get:

>>>
('C:\\Users\\rakesh.j.kulkarni\\AppData\\Local\\Temp\\tmps7559wgi'
http.client.HTTPMessage object at 0x03A3C610>)


when opened the file I get html source file which when opened with browser it the same webpage's login form,

So i have to come up with the alternate process to do the same for time being until the problem gets resolved

subprocess.Popen(["C:\Program Files (x86)\Google\Chrome\Application\chrome.exe", down_link])


I then go to downloads and work on the file.

Answer

First of, you should import urllib.request, and not just urllib (in Py3).
And you are assigning the object to a variable so its giving you the object instance as output. Nothing wrong with that, just to give you a quick fix, try doing:

In [1]: import urllib.request

In [2]: down_link = "http://vignette3.wikia.nocookie.net/shipoffools/images/4/42/Surprised_Luffy.jpg/revision/latest?cb=20120921134043"

In [3]: path_to_save = "../luffy.jpg"

In [4]: urllib.request.urlretrieve(down_link, path_to_save)
Out[4]: ('../luffy.jpg', <http.client.HTTPMessage at 0x47f6af0>)

This will work just fine, saving the image where you want. If you dont specify the path_to_save, then thats fine too, as it will download anyways and the path will be the tmp directory, In your case it would be C:\\Users\\rakesh.j.kulkarni\\AppData\\Local\\Temp\\ folder.

In the case of https related error or any other problem, there is a cleaner way of doing it, by reading the file with urlopen and saving it in a file on your computer:

In [5]: import urllib.request as req

In [6]: down_link = "https://vignette3.wikia.nocookie.net/shipoffools/images/4/42/
   ...: Surprised_Luffy.jpg/revision/latest?cb=20120921134043"

In [7]: fname = "../luffy.jpg"

In [8]: with req.urlopen(down_link) as d, open(fname, "wb") as opfile:
   ...:     data = d.read()
   ...:     opfile.write(data)
   ...:

NOTE: This method can take some time, but works just fine for normal small files.


javascript download/redirect: In case of the javascript or php script download, using subprocess to open a link in browser won't actually be dynamic code as you need to specify the path of the browser, instead you can use an pre-existing module webbrowser, It will automatically detect the default browser in the system and open the url.

import webbrowser
url = ...
webbrowser.open(url, autoraise=True) # normal
webbrowser.open_new(url)             # new window
webbrowser.open_new_tab(url)         # new tab
Comments