krlmlr krlmlr - 2 months ago 22
Python Question

Transparent decompression of stream

How do I read a file in Python that may or may not be gzip-compressed?

My current code

with gzip.open("file.xml") as f:
xml.sax.parse(f, reader)


works with
.xml.gz
files but not with
.xml
files:

...
File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)


Is there a substitute for the
gzip.open()
call that always returns an uncompressed stream based on file contents and/or file extension?

The answers to the related question would solve my problem, but I'm looking for a packaged solution that doesn't involve any extra code.

Answer

Just use the function defined in this answer to a related question:

import gzip

def opener(filename):
    f = open(filename, 'rb')
    if (f.read(2) == '\x1f\x8b'):
        f.seek(0)
        return gzip.GzipFile(fileobj=f)
    else:
        f.seek(0)
        return f

You can also extend it to support other file formats.