chebyshev chebyshev - 2 months ago 67
Python Question

sparse hstack and weird dtype conversion error

In working with some text data, I'm trying to join an np array(from a pandas series) to a csr matrix.

I've done the below.

#create a compatible sparse matrix from my np.array.
#sparse.csr_matrix(X['link'].values) returns array size (1,7395)
#transpose that array for (7395,1)

X = sparse.csr_matrix(X['link'].values.transpose)


#bodies is a sparse.csr_matrix with shape (7395, 20000)

bodies = sparse.hstack((bodies,X))


However, this line gives the error
no supported conversion for types: (dtype('O'),)
. I'm not sure what this means? How do I get around it?

Thanks.

Answer

Here's Saullo Castro's comment cast as an answer:

x = np.arange(12).reshape(1,12)  # ndarray
sparse.csr_matrix(x)
Out[14]: <1x12 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>

x.transpose   # function, not ndarray
Out[15]: <function transpose>  

X = sparse.csr_matrix(x.transpose)
TypeError: no supported conversion for types: (dtype('O'),)

The error occurs before using hstack, trying to make a sparse matrix from a function rather than an ndarray. The error was omitting the ().

# x.transpose() == x.T   # ndarray

sparse.csr_matrix(x.transpose())
Out[17]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>

sparse.csr_matrix(x.T)
Out[18]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>


bodies = sparse.rand(12,3,format='csr',density=.1)
sparse.hstack((bodies,X))
Out[32]: <12x4 sparse matrix of type '<type 'numpy.float64'>'
with 14 stored elements in COOrdinate format>

csr_matrix works fine if it is given the transposed array.