user3426358 user3426358 - 12 days ago 6
Python Question

Add two matrices having different dimensions, labels and retain labels in sum matrix

I have two square matrices as two csv files. Each matrix has a row and column label. Each row, column label detects relation score between two companies.
Both the marices have a few labels (companies) which are missing in the other. Also, the dimensions of both matrices are different. ( One is 427 x 427, other is 467 x 467 )

Ex: In the matrix below, the numbers in bold are labels (company serial numbers).

0 is the score between (10010, 10010) .
0.43967 is the score between (1013, 10010) and so on.


enter image description here

I need to add these two matrices in such a way that:
In the resulting matrix ( sum of the two matrix), no label (row or column labels) is lost. If label exist in A and not in B, the resulting matrix should have a row & column entry (label and values) for it and vice versa for B.

Example:
Note that {a, b ... e} are labels. X & Y are matrices.
a b c d a e
a 1.2 1.3 1.4 1.5 a 9.1 9.2
X= b 2.1 2.2 2.3 2.4 Y= e 8.1 8.2
c 3.3 3.4 3.5 3.6
d 4.2 4.3 4.4 4.5

a b c d e
a 1.2+9.1 1.3 1.4 1.5 9.2
X+Y= b 2.1 2.2 2.3 2.4 0
c 3.3 3.4 3.5 3.6 0
d 4.2 4.3 4.4 4.5 0
e 8.1 0 0 0 8.2


I had posted a similar question but the answer given did not take into account the row labels and made the resulting matrix rectangular. The resulting matrix must be square too.

I am lost at how this should be done in python/ pandas or any other python libraries. Can someone help me with this?

The matrix files are here.

Answer

Assuming that the columns and indices labels are the same values, if you take the union of the columns and reindex both dfs and fill the NaN with 0 prior to adding then you get the result you desire.

if the column and index labels are not the same you can just make 2 arrays of the combined labels:

all_cols = x.columns | y.columns
all_indices = x.index | y.index

and reindex.

With your example data:

In [11]:
all_cols = x.columns | y.columns
x = x.reindex(index=all_cols, columns=all_cols).fillna(0)
y = y.reindex(index=all_cols, columns=all_cols).fillna(0)
x+y

Out[11]:
      a    b    c    d    e
a  10.3  1.3  1.4  1.5  9.2
b   2.1  2.2  2.3  2.4  0.0
c   3.3  3.4  3.5  3.6  0.0
d   4.2  4.3  4.4  4.5  0.0
e   8.1  0.0  0.0  0.0  8.2

Actually you can just reindex the result of add:

In [13]:
x.add(y, fill_value=0).reindex(index=all_cols,columns=all_cols).fillna(0)

Out[13]:
      a    b    c    d    e
a  10.3  1.3  1.4  1.5  9.2
b   2.1  2.2  2.3  2.4  0.0
c   3.3  3.4  3.5  3.6  0.0
d   4.2  4.3  4.4  4.5  0.0
e   8.1  0.0  0.0  0.0  8.2

EDIT

regarding your real data, you need to specify that the first unnamed column is the index:

X= pd.read_csv('30203_Transpose.csv', index_col=0)
Y= pd.read_csv('62599_1999psCSV.csv', index_col=0)

Additionally your index is not the same as your columns so these need to be aggregated separately

all_cols = X.columns | Y.columns
all_indices = X.index | Y.index

then you can add:

X.add(Y, fill_value=0).reindex(index=all_cols,columns=all_cols).fillna(0)