ed3203 ed3203 - 5 months ago 33x
Python Question

sklearn GMM classification prediction (component assignment) order

I'm trying to classify cells into populations. When I use:

gmix = mixture.GMM(n_components=3, covariance_type='full')

The means output, from the code below, changes in order, unless I set:

print ("gmix.means \n", gmix.means_)
colors = ['r' if i==0 else ('g' if i==1 else ('b' if i ==2 else 'm'))for i in gmix.predict(samples)]

I would like the classes sorted by the X axis mean (first item of each class) ie:

[[ 3.25492404e+02 2.88403293e-02]
[ 3.73942908e+02 3.25283512e-02]
[ 5.92577646e+02 4.40595768e-02]]

So in the code above red would always be 325, green 372 and blue 592. At the moment I don't think there is anything sorting the output.

I tried:

gmix.means_ = np.sort(gmix.means_, axis = 0)

But then the gmix.covars_ and gmix.weights_ also need to be sorted accordingly, which is where I'm stuck!

Many thanks!

Edit 4/5/16:

Thanks for the help and steering me in the right direction. Here is my poorly written but working version:

sort_indices = gmix.means_.argsort(axis = 0)
order = sort_indices[:, 0]
print('\norder:', order)
gmix.means_ = gmix.means_[order,:]

gmix.covars_ = gmix.covars_[order, :]
print ("\n sorted gmix.covars \n", gmix.covars_)

print ("\n\nori gmix.weights \n", gmix.weights_)
w = np.split(gmix.weights_,3)
w = np.asarray(w)
w = np.ravel(w[order,:])
gmix.weights_ = w


This is basically a matrix/vector indexing problem. I'm probably being too verbose here, but it should be just two lines to sort your matrices.

Clustering algorithms in general (GMM in your case) are not guaranteed to label the clusters in the same order every time, neither are they guaranteed to give you the same clusters every time, unless you fix the initial conditions.

If you want the clusters sorted by their X-coordinate of their means, you probably may need to do this yourself. This involves 2 steps, just like you mentioned in your question:

a) Sort the means and get the indices b) Use the indices to extract your means out

This can be done simply as follows:

a) Do an argsort on your means

>>> means = np.array(np.mat('1, 2; 4, 3; 2, 6'))
>>> sort_indices = means.argsort(axis=0)
array([[0, 0],
       [2, 1],
       [1, 2]])

Your order would be the first column of the argsorted array:

>>> order = sort_indices[:,0]
>>> order
array([0, 2, 1])

(b) Now, we will use this 'order' to reorder your means.

>>> sorted_m = means[order,:]
>>> sorted_m

array([[1, 2],
       [2, 6],
       [4, 3]])

and your covariances, let us create a dummy covariance matrix:

>>> c = np.array(np.mat('9, 8, 7; 6, 5, 4; 3, 2, 1'))
>>> c
array([[9, 8, 7],
       [6, 5, 4],
       [3, 2, 1]])

Now, reindex your c, and an easy way is to just reindex:

>>> sorted_c = c[order,:][:, order]
>>> sorted_c
array([[9, 7, 8],
       [3, 1, 2],
       [6, 4, 5]])

If you see, the rows and columns are rearranged according to our new order.

There you have it, bot your means and covariances sorted.

You may need to relabel your original labels as well, for which you can use the answer here: Fast replacement of values in a numpy array