mgokhanbakal mgokhanbakal - 1 month ago 5
Python Question

Loading a very large txt file and taking transpose

I have a tab separated .txt file that keeps numbers as matrix. Number of lines is 904,652 and the number of columns is 26,600 (tab separated). The total size of the file is around 48 GB. I need to load this file as matrix and take the transpose of the matrix to extract training and testing data. I am using Python, pandas, and

sklearn
packages. I behave 500GB memory server but it is not enough to load it with pandas package. Could anyone help me about my problem?

The loading code part is below:

def open_with_pandas_read_csv(filename):
df = pandas.read_csv(filename, sep=csv_delimiter, header=None)
data = df.values
return data

Answer

I found a solution (I believe there are still more efficient and logical ones) on stackoverflow. np.fromfile() method loads huge files more efficiently more than np.loadtxt() and np.genfromtxt() and even pandas.read_csv(). It took just around 274 GB without any modification or compression. I thank everyone who tried to help me on this issue.

Comments