Sil Sil - 28 days ago 17
Python Question

Transform Matrix Market matrix into pandas Data frame python

I have a Market Matrix file, which I have to use for carrying out text analyses.

The market file has the following structure:

%%MatrixMarket matrix coordinate integer general
2000 5000 23000
1 4300 1
1 2200 1
1 3000 1
1 600 1


The values in the second lines indicate the number of rows, number of columns, and total number of non-zero values in the matrix. All lines after this contain 3 values:


  • the row (indexed from 1), which represents my text document;

  • the column (index from 1), which represents a word;

  • the term frequency.



As read in many posts I read this file, using scipy.io.mmread and the new API for dealing with parse data structure.

In particular, I used the following code:

Matrix = (mmread('file_name.mtx'))
B = Matrix.todense()
df = pd.DataFrame(B)
print(df.head())


However, from this code I got a data frame indexed from 0:

0 1 2 3 4 5 6 7 8 9 ... 4872 \
0 1 0 1 0 0 0 0 0 1 0 ... 0
1 0 0 0 0 0 0 0 0 0 0 ... 0
2 0 0 0 0 0 0 0 0 0 0 ... 0
3 1 0 1 0 0 0 0 0 1 0 ... 0
4 0 0 1 0 0 0 0 0 0 0 ... 0


The ideal results will be to preserve the format of the original market matrix with row and columns indexed from 1.

Any ideas how to correct my code?

Thanks!

Answer

you can specify the index and column for the dataframe

Matrix = (mmread('file_name.mtx'))
B = Matrix.todense()
df = pd.DataFrame(B, range(1, B.shape[0] + 1), range(1, B.shape[1] + 1))
print(df.iloc[:5, :5])

   1  2  3  4  5
1  0  0  0  0  0
2  0  0  0  0  0
3  0  0  0  0  0
4  0  0  0  0  0
5  0  0  0  0  0