thepolina - 4 months ago 32

Python Question

I have a sparse matrix

`from scipy.sparse import *`

M = csr_matrix((data_np, (rows_np, columns_np)));

then I'm doing clustering that way

`from sklearn.cluster import KMeans`

km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1, verbose=1)

km.fit(M)

and my question is extremely noob: how to print the clustering result without any extra information. I don't care about plotting or distances. I just need clustered rows looking that way

`Cluster 1`

row 1

row 2

row 3

Cluster 2

row 4

row 20

row 1000

...

How can I get it? Excuse me for this question.

Answer

Time to help myself. After

```
km.fit(M)
```

we run

```
labels = km.predict(M)
```

which returns **labels**, numpy.ndarray. Number of elements in this array equals number of rows. And each element means that a row belongs to the cluster.
For example: if first element is 5 it means that row 1 belongs to cluster 5.
Lets put our rows in a dictionary of lists looking this way {cluster_number:[row1, row2, row3], ...}

```
# in row_dict we store actual meanings of rows, in my case it's russian words
clusters = {}
n = 0
for item in labels:
if item in clusters:
clusters[item].append(row_dict[n])
else:
clusters[item] = [row_dict[n]]
n +=1
```

and print the result

```
for item in clusters:
print "Cluster ", item
for i in clusters[item]:
print i
```