user3426358 - 11 months ago 74

Python Question

I have two square matrices as two csv files. Each matrix has a row and column label. Each row, column label detects relation score between two companies.

Both the marices have a few labels (companies) which are missing in the other. Also, the dimensions of both matrices are different. ( One is 427 x 427, other is 467 x 467 )

Ex: In the matrix below, the numbers in bold are labels (company serial numbers).

`0 is the score between (10010, 10010) .`

0.43967 is the score between (1013, 10010) and so on.

I need to add these two matrices in such a way that:

In the resulting matrix ( sum of the two matrix), no label (row or column labels) is lost.

`Example:`

Note that {a, b ... e} are labels. X & Y are matrices.

a b c d a e

a 1.2 1.3 1.4 1.5 a 9.1 9.2

X= b 2.1 2.2 2.3 2.4 Y= e 8.1 8.2

c 3.3 3.4 3.5 3.6

d 4.2 4.3 4.4 4.5

a b c d e

a 1.2+9.1 1.3 1.4 1.5 9.2

X+Y= b 2.1 2.2 2.3 2.4 0

c 3.3 3.4 3.5 3.6 0

d 4.2 4.3 4.4 4.5 0

e 8.1 0 0 0 8.2

I had posted a similar question but the answer given did not take into account the row labels and made the resulting matrix rectangular.

I am lost at how this should be done in python/ pandas or any other python libraries. Can someone help me with this?

The matrix files are here.

Answer Source

Assuming that the columns and indices labels are the same values, if you take the union of the columns and reindex both dfs and fill the `NaN`

with `0`

prior to adding then you get the result you desire.

if the column and index labels are not the same you can just make 2 arrays of the combined labels:

```
all_cols = x.columns | y.columns
all_indices = x.index | y.index
```

and `reindex`

.

With your example data:

```
In [11]:
all_cols = x.columns | y.columns
x = x.reindex(index=all_cols, columns=all_cols).fillna(0)
y = y.reindex(index=all_cols, columns=all_cols).fillna(0)
x+y
Out[11]:
a b c d e
a 10.3 1.3 1.4 1.5 9.2
b 2.1 2.2 2.3 2.4 0.0
c 3.3 3.4 3.5 3.6 0.0
d 4.2 4.3 4.4 4.5 0.0
e 8.1 0.0 0.0 0.0 8.2
```

Actually you can just `reindex`

the result of `add`

:

```
In [13]:
x.add(y, fill_value=0).reindex(index=all_cols,columns=all_cols).fillna(0)
Out[13]:
a b c d e
a 10.3 1.3 1.4 1.5 9.2
b 2.1 2.2 2.3 2.4 0.0
c 3.3 3.4 3.5 3.6 0.0
d 4.2 4.3 4.4 4.5 0.0
e 8.1 0.0 0.0 0.0 8.2
```

**EDIT**

regarding your real data, you need to specify that the first unnamed column is the index:

```
X= pd.read_csv('30203_Transpose.csv', index_col=0)
Y= pd.read_csv('62599_1999psCSV.csv', index_col=0)
```

Additionally your index is not the same as your columns so these need to be aggregated separately

```
all_cols = X.columns | Y.columns
all_indices = X.index | Y.index
```

then you can `add`

:

```
X.add(Y, fill_value=0).reindex(index=all_cols,columns=all_cols).fillna(0)
```