user1717931 - 2 months ago 55

Python Question

I have two sparse-matrices (created out of

`sklearn`

`HashVectorizer`

Here is an example:

`Xa = [-0.57735027 -0.57735027 0.57735027 -0.57735027 -0.57735027 0.57735027`

0.5 0.5 -0.5 0.5 0.5 -0.5 0.5

0.5 -0.5 0.5 -0.5 0.5 0.5 -0.5

0.5 0.5 ]

Xb = [-0.57735027 -0.57735027 0.57735027 -0.57735027 0.57735027 0.57735027

-0.5 0.5 0.5 0.5 -0.5 -0.5 0.5

-0.5 -0.5 -0.5 0.5 0.5 ]

Both

`Xa`

`Xb`

`<class 'scipy.sparse.csr.csr_matrix'>`

`Xa.shape = (6, 1048576) Xb.shape = (5, 1048576)`

`X = hstack((Xa, Xb))`

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.py", line 464, in hstack

return bmat([blocks], format=format, dtype=dtype)

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.py", line 581, in bmat

'row dimensions' % i)

ValueError: blocks[0,:] has incompatible row dimensions

Is there a way to stack the sparse-matrices despite their irregular dimensions? Maybe with some padding?

I have looked into these posts:

- Concatenate sparse matrices in Python using SciPy/Numpy
- Is there an efficient way of concatenating scipy.sparse matrices?.

Answer

You can pad it with an empty sparse matrix.

You want to **horizontaly stack** it so you need to pad the smaller matrix so that it has the **same number of rows** as the larger matrix. For that you **vertically stack** it with a matrix of shape `(difference in number of rows, number of columns of original matrix)`

.

Like this:

```
from scipy.sparse import csr_matrix
from scipy.sparse import hstack
from scipy.sparse import vstack
# Create 2 empty sparse matrix for demo
Xa = csr_matrix((4, 4))
Xb = csr_matrix((3, 5))
diff_n_rows = Xa.shape[0] - Xb.shape[0]
Xb_new = vstack((Xb, csr_matrix((diff_n_rows, Xb.shape[1]))))
#where diff_n_rows is the difference of the number of rows between Xa and Xb
X = hstack((Xa, Xb_new))
X
```

Which results in:

```
<4x9 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
```