jony jony - 6 months ago 49
Python Question

Decoding urllib.request response

I'm getting this response when I open this url:

r = Request(r'')
h = urlopen(r).readline()



What encoding is this?
Is there a way to decode it based on the standard library?

Thank you in advance for any insight on this matter!

PS: It seems to be gzip.


It's gzip compressed HTML, as you suspected.

Rather than use urllib use requests which will decompress the response for you:

import requests

r = requests.get('')

You can install it with pip install requests, and never look back.

If you really must restrict yourself to the standard library, then decompress it with the gzip module:

import gzip
import urllib2
from cStringIO import StringIO

f = urllib2.urlopen('')

# how to determine the content encoding
content_encoding = f.headers.get('Content-Encoding')

# how to decompress gzip data
if content_encoding == 'gzip':
    gz = gzip.GzipFile(fileobj=StringIO(
    response =