Rafael Frinhani Rafael Frinhani - 1 month ago 9
Python Question

How to get a numpy ndarray of integers from a file with header?

I have a plain text file (.txt) with the following content.

Matrix Header.
6 11
0 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 0 1 1 1 1
1 1 1 1 0 0 1 1 1 1 1
0 0 0 0 1 1 1 0 0 0 0
1 1 1 0 0 1 1 1 1 1 1
1 0 0 1 1 1 1 0 1 1 0

6 rows, 11 columns


I need obtain a numpy ndarray of integers as below:

[[0 1 1 1 1 1 1 1 1 1 1]
[1 0 1 1 1 1 0 1 1 1 1]
[1 1 1 1 0 0 1 1 1 1 1]
[0 0 0 0 1 1 1 0 0 0 0]
[1 1 1 0 0 1 1 1 1 1 1]
[1 0 0 1 1 1 1 0 1 1 0]]


I tried the following strategy

import pandas
import numpy
data = pandas.read_table(path, skiprows= 2)
data = data.values
print(data)


But the resulting ndarray isn't in the correct format.

[['0 1 1 1 1 1 1 1 1 1 1 ']
['1 0 1 1 1 1 0 1 1 1 1 ']
['1 1 1 1 0 0 1 1 1 1 1 ']
['0 0 0 0 1 1 1 0 0 0 0 ']
['1 1 1 0 0 1 1 1 1 1 1 ']
['1 0 0 1 1 1 1 0 1 1 0 ']]


Can anybody help me?

Answer

To avoid the error that might occur because of the text at the end, you can use numpy.genfromtxt with the max_rows argument. For example,

In [26]: with open(filename, 'rb') as f:
    ...:     f.readline()  # skip the header
    ...:     nrows, ncols = [int(field) for field in f.readline().split()]
    ...:     data = np.genfromtxt(f, dtype=int, max_rows=nrows)
    ...:     

In [27]: data
Out[27]: 
array([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1],
       [1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
       [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1],
       [1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0]])

(I opened the file in binary mode to avoid a bytes/str problem that genfromtxt has in Python 3.)