I'm trying to classify cells into populations. When I use:
gmix = mixture.GMM(n_components=3, covariance_type='full')
print ("gmix.means \n", gmix.means_)
colors = ['r' if i==0 else ('g' if i==1 else ('b' if i ==2 else 'm'))for i in gmix.predict(samples)]
[[ 3.25492404e+02 2.88403293e-02]
[ 3.73942908e+02 3.25283512e-02]
[ 5.92577646e+02 4.40595768e-02]]
gmix.means_ = np.sort(gmix.means_, axis = 0)
sort_indices = gmix.means_.argsort(axis = 0)
order = sort_indices[:, 0]
gmix.means_ = gmix.means_[order,:]
gmix.covars_ = gmix.covars_[order, :]
print ("\n sorted gmix.covars \n", gmix.covars_)
print ("\n\nori gmix.weights \n", gmix.weights_)
w = np.split(gmix.weights_,3)
w = np.asarray(w)
w = np.ravel(w[order,:])
gmix.weights_ = w
This is basically a matrix/vector indexing problem. I'm probably being too verbose here, but it should be just two lines to sort your matrices.
Clustering algorithms in general (GMM in your case) are not guaranteed to label the clusters in the same order every time, neither are they guaranteed to give you the same clusters every time, unless you fix the initial conditions.
If you want the clusters sorted by their X-coordinate of their means, you probably may need to do this yourself. This involves 2 steps, just like you mentioned in your question:
a) Sort the means and get the indices b) Use the indices to extract your means out
This can be done simply as follows:
a) Do an
argsort on your means
>>> means = np.array(np.mat('1, 2; 4, 3; 2, 6')) >>> sort_indices = means.argsort(axis=0) array([[0, 0], [2, 1], [1, 2]])
Your order would be the first column of the argsorted array:
>>> order = sort_indices[:,0] >>> order array([0, 2, 1])
(b) Now, we will use this 'order' to reorder your means.
>>> sorted_m = means[order,:] >>> sorted_m array([[1, 2], [2, 6], [4, 3]])
and your covariances, let us create a dummy covariance matrix:
>>> c = np.array(np.mat('9, 8, 7; 6, 5, 4; 3, 2, 1')) >>> c array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])
Now, reindex your c, and an easy way is to just reindex:
>>> sorted_c = c[order,:][:, order] >>> sorted_c array([[9, 7, 8], [3, 1, 2], [6, 4, 5]])
If you see, the rows and columns are rearranged according to our new order.
There you have it, bot your means and covariances sorted.
You may need to relabel your original labels as well, for which you can use the answer here: Fast replacement of values in a numpy array