I have a sparse matrix in csr_matrix format. For each row i need to subtract row mean from the nonzero elements. The means must be computed on the number of the nonzero elements of the row (instead of the length of the row).
I found a fast way yo compute the row means with the following code:
# M is a csr_matrix
sums = np.squeeze(np.asarray(M.sum(1))) # sum of the nonzero elements, for each row
counts = np.diff(M.tocsr().indptr) # count of the nonzero elements, for each row
# for the i-th row the mean is just sums[i] / float(counts[i])
M = M.tolil()
for i in xrange(len()):
for j in M.getrow(i).nonzero():
M[i, j] -= sums[i] / float(counts[i])
This one is tricky. I think I have it. The basic idea is that we try to get a diagonal matrix with the means on the diagonal, and a matrix that is like M, but has ones at the nonzero data locations in M. Then we multiply those and subtract the product from M. Here goes...
>>> import numpy as np >>> import scipy.sparse as sp >>> a = sp.csr_matrix([[1., 0., 2.], [1.,2.,3.]]) >>> a.todense() matrix([[ 1., 0., 2.], [ 1., 2., 3.]]) >>> tot = np.array(a.sum(axis=1).squeeze()) >>> tot array([ 3., 6.]) >>> cts = np.diff(a.indptr) >>> cts array([2, 3], dtype=int32) >>> mu = tot/cts >>> mu array([ 1.5, 2. ]) >>> d = sp.diags(mu, 0) >>> d.todense() matrix([[ 1.5, 0. ], [ 0. , 2. ]]) >>> b = a.copy() >>> b.data = np.ones_like(b.data) >>> b.todense() matrix([[ 1., 0., 1.], [ 1., 1., 1.]]) >>> (d * b).todense() matrix([[ 1.5, 0. , 1.5], [ 2. , 2. , 2. ]]) >>> (a - d*b).todense() matrix([[-0.5, 0. , 0.5], [-1. , 0. , 1. ]])
Good Luck! Hope that helps.