George George - 1 year ago 43
Python Question

reading compressed data gives different results

I am using genfromtxt in order to read data.

genfromtxt must work also for .gz files but it seems it doesn't.

Using simple data ( not .gz files )

f = open('file', 'r')
con = np.genfromtxt(f,dtype=str)

print con
print type(con)

file contents is:


and output of above code is:

['@HWI' 'ABCDE' '+' '@HWI7' 'EFSA' '+' '???=AF' 'GTEY@JF' 'GVTAWM']
<type 'numpy.ndarray'>

If , I simply use the same code with the aboce file compressed as .gz file , the output is:

[ "\x1f\x8b\x08\x08\x1b4\x8eW\x00\x03file\x00Sp\xf0\x08\xf7\xe4R\x00\x02G'g\x17W0K\x1bL\x82$\xcc\xc1,W\xb7`G$"
'{{{[G70\xd3=\xc45\xd2\xc1' '\xca\x0e' 'q' '\xf7\x05' '\x06\x07\xc2P']
<type 'numpy.ndarray'>

And the problem is that I want to perform some calculations later and I can't like this.

I tried also ( for the .gz version ) :

with, 'r') as f:
con = np.array([])

print con
print type(con)

which gives :

[ ' @HWI\n ABCDE\n +\n @HWI7\n EFSA\n +\n ???=AF\n GTEY@JF\n GVTAWM']
<type 'numpy.ndarray'>

which is closer to the initial but still doesn't work ( can't move on with calculations )

How can I accomplish the same result?

Answer Source

Why don't you use genfromtxt with the file object from

with'file.gz') as f:
    print(numpy.genfromtxt(f, dtype=str))


numpy uses predefined file openers for .gz and .bz2 files. You can check the configuration like:

import numpy.lib._datasource as DS

On my machine this shows handlers for bz2 and gz files:

{'.bz2': <type 'bz2.BZ2File'>, None: <built-in function open>, '.gz': <function open at 0x7efca562a6e0>}

Since the handler for gz files is actually, it seems strange that numpy doesn't use it on your machine.