Darshil Chauhan Darshil Chauhan - 6 months ago 30
Python Question

How to get size of a file from Webpage in BeautifulSoup

I am using

BeautifulSoup
in Python.

I want to get the size of a downloadable file from webpage. For example, this page has a link to download
txt
file (by clicking on "save"). How can I get the size (in Bytes) of that file (preferably without downloading it)?

If there is no option in
BeautifulSoup
, then please suggest other options within and outside of Python.

AKS AKS
Answer

Using requests package, you can send a HEAD request to the URL which serves the text file and check the Content-Length in the header:

>>> url = "http://cancer.jpl.nasa.gov/fmprod/data?refIndex=0&productID=02965767-873d-11e5-a4ea-252aa26bb9af"
>>> res = requests.head(url)
>>> res.headers
{'content-length': '944', 'content-disposition': 'attachment; filename="Lab001_A_R03.txt"', 'server': 'Apache-Coyote/1.1', 'connection': 'close', 'date': 'Thu, 19 May 2016 05:04:45 GMT', 'content-type': 'text/plain; charset=UTF-8'}
>>> int(res.headers['content-length'])
944

As you can see the size is same as mentioned on the page.