KrunalParmar KrunalParmar - 4 years ago 239
Python Question

Read .tar.gz file in Python

I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :

import tarfile
import numpy as np

tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f=tar.extractfile(member)
content = f.read()
Data = np.loadtxt(content)


the error is as follows :

Traceback (most recent call last):
File "dataExtPlot.py", line 21, in <module>
content = f.read()
AttributeError: 'NoneType' object has no attribute 'read'


also, Is there any other method to do this task ?

Answer Source

The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.

One possible solution is to skip over the None results:

tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
     f = tar.extractfile(member)
     if f is not None:
         content = f.read()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download