user3426358 - 1 year ago 127
Python Question

# Add two matrices having different dimensions, labels and retain labels in sum matrix

I have two square matrices as two csv files. Each matrix has a row and column label. Each row, column label detects relation score between two companies.
Both the marices have a few labels (companies) which are missing in the other. Also, the dimensions of both matrices are different. ( One is 427 x 427, other is 467 x 467 )

Ex: In the matrix below, the numbers in bold are labels (company serial numbers).

``````0 is the score between (10010, 10010) .
0.43967 is the score between (1013, 10010) and so on.
``````

I need to add these two matrices in such a way that:
In the resulting matrix ( sum of the two matrix), no label (row or column labels) is lost. If label exist in A and not in B, the resulting matrix should have a row & column entry (label and values) for it and vice versa for B.

``````Example:
Note that {a, b ... e} are labels. X & Y are matrices.
a   b   c   d               a   e
a   1.2 1.3 1.4 1.5         a   9.1 9.2
X=  b   2.1 2.2 2.3 2.4     Y=  e   8.1 8.2
c   3.3 3.4 3.5 3.6
d   4.2 4.3 4.4 4.5

a       b       c       d       e
a   1.2+9.1 1.3     1.4     1.5     9.2
X+Y=    b   2.1     2.2     2.3     2.4     0
c   3.3     3.4     3.5     3.6     0
d   4.2     4.3     4.4     4.5     0
e   8.1     0       0       0       8.2
``````

I had posted a similar question but the answer given did not take into account the row labels and made the resulting matrix rectangular. The resulting matrix must be square too.

I am lost at how this should be done in python/ pandas or any other python libraries. Can someone help me with this?

The matrix files are here.

Assuming that the columns and indices labels are the same values, if you take the union of the columns and reindex both dfs and fill the `NaN` with `0` prior to adding then you get the result you desire.

if the column and index labels are not the same you can just make 2 arrays of the combined labels:

``````all_cols = x.columns | y.columns
all_indices = x.index | y.index
``````

and `reindex`.

``````In [11]:
all_cols = x.columns | y.columns
x = x.reindex(index=all_cols, columns=all_cols).fillna(0)
y = y.reindex(index=all_cols, columns=all_cols).fillna(0)
x+y

Out[11]:
a    b    c    d    e
a  10.3  1.3  1.4  1.5  9.2
b   2.1  2.2  2.3  2.4  0.0
c   3.3  3.4  3.5  3.6  0.0
d   4.2  4.3  4.4  4.5  0.0
e   8.1  0.0  0.0  0.0  8.2
``````

Actually you can just `reindex` the result of `add`:

``````In [13]:

Out[13]:
a    b    c    d    e
a  10.3  1.3  1.4  1.5  9.2
b   2.1  2.2  2.3  2.4  0.0
c   3.3  3.4  3.5  3.6  0.0
d   4.2  4.3  4.4  4.5  0.0
e   8.1  0.0  0.0  0.0  8.2
``````

EDIT

regarding your real data, you need to specify that the first unnamed column is the index:

``````X= pd.read_csv('30203_Transpose.csv', index_col=0)
``````all_cols = X.columns | Y.columns
then you can `add`:
``````X.add(Y, fill_value=0).reindex(index=all_cols,columns=all_cols).fillna(0)