George George - 4 months ago 9
Python Question

reading compressed data gives different results

I am using genfromtxt in order to read data.

genfromtxt must work also for .gz files but it seems it doesn't.

Using simple data ( not .gz files )

f = open('file', 'r')
con = np.genfromtxt(f,dtype=str)

print con
print type(con)


file contents is:

@HWI
ABCDE
+
@HWI7
EFSA
+
???=AF
GTEY@JF
GVTAWM


and output of above code is:

['@HWI' 'ABCDE' '+' '@HWI7' 'EFSA' '+' '???=AF' 'GTEY@JF' 'GVTAWM']
<type 'numpy.ndarray'>


If , I simply use the same code with the aboce file compressed as .gz file , the output is:

[ "\x1f\x8b\x08\x08\x1b4\x8eW\x00\x03file\x00Sp\xf0\x08\xf7\xe4R\x00\x02G'g\x17W0K\x1bL\x82$\xcc\xc1,W\xb7`G$"
'{{{[G70\xd3=\xc45\xd2\xc1' '\xca\x0e' 'q' '\xf7\x05' '\x06\x07\xc2P']
<type 'numpy.ndarray'>


And the problem is that I want to perform some calculations later and I can't like this.

I tried also ( for the .gz version ) :

with gzip.open(file, 'r') as f:
con = np.array([f.read()])

print con
print type(con)


which gives :

[ ' @HWI\n ABCDE\n +\n @HWI7\n EFSA\n +\n ???=AF\n GTEY@JF\n GVTAWM']
<type 'numpy.ndarray'>


which is closer to the initial but still doesn't work ( can't move on with calculations )

How can I accomplish the same result?

Answer

Why don't you use genfromtxt with the file object from gzip.open()?

with gzip.open('file.gz') as f:
    print(numpy.genfromtxt(f, dtype=str))

EDIT

numpy uses predefined file openers for .gz and .bz2 files. You can check the configuration like:

import numpy.lib._datasource as DS
DS._file_openers._load()
print(DS._file_openers._file_openers)

On my machine this shows handlers for bz2 and gz files:

{'.bz2': <type 'bz2.BZ2File'>, None: <built-in function open>, '.gz': <function open at 0x7efca562a6e0>}

Since the handler for gz files is actually gzip.open, it seems strange that numpy doesn't use it on your machine.

Comments