Crystal Crystal - 1 year ago 59
Python Question

Modify kmeans alghoritm for 1d array where order matters

I want to find groups in one dimensional array where order/position matters. I tried to use numpys kmeans2 but it works only when i have numbers in increasing order.
I have to maximize average difference between neigbour sub-arrays

For example: if I have array

and i want to get 4 groups the result should be something like
[1,2,2], [8,9], [0,0,0], [1,1,1]

Is there a way to do it in better then O(n^k)

answer: I ended up with modiied dendrogram, where I merge neigbors only.


K-means is about minimizing the least squares. Among it's largest drawbacks (there are many) is that you need to know k. Why do you want to inherit this drawback?

Instead of hacking k-means into not ignoring the order, why don't you instead look at time series segmentation and change detection approaches that are much more appropriate for this problem?

E.g. split your time series if abs(x[i] - x[-1]) > stddev where stddev is the standard deviation of your data set. Or the standard deviation of the last 10 samples (in above series, the standard deviation is about 3, so it would split as [1,2,2], [8,9], [0,0,0,1,1,1] because the change 0 to 1 is not significant.