I have a tab separated .txt file that keeps numbers as matrix. Number of lines is 904,652 and the number of columns is 26,600 (tab separated). The total size of the file is around 48 GB. I need to load this file as matrix and take the transpose of the matrix to extract training and testing data. I am using Python, pandas, and
df = pandas.read_csv(filename, sep=csv_delimiter, header=None)
data = df.values
I found a solution (I believe there are still more efficient and logical ones) on stackoverflow.
np.fromfile() method loads huge files more efficiently more than
np.genfromtxt() and even
pandas.read_csv(). It took just around 274 GB without any modification or compression. I thank everyone who tried to help me on this issue.