Crystal - 9 months ago 46

Python Question

I want to find groups in one dimensional array where order/position matters. I tried to use numpys kmeans2 but it works only when i have numbers in increasing order.

I have to maximize average difference between neigbour sub-arrays

For example: if I have array

`[1,2,2,8,9,0,0,0,1,1,1]`

`[1,2,2], [8,9], [0,0,0], [1,1,1]`

Is there a way to do it in better then O(n^k)

answer: I ended up with modiied dendrogram, where I merge neigbors only.

Answer

K-means is about minimizing the least squares. Among it's largest drawbacks (there are many) is that you need to know k. Why do you want to inherit this drawback?

Instead of hacking k-means into not ignoring the order, why don't you instead look at **time series segmentation** and **change detection** approaches that are much more appropriate for this problem?

E.g. split your time series if `abs(x[i] - x[-1]) > stddev`

where `stddev`

is the standard deviation of your data set. Or the standard deviation of the last 10 samples (in above series, the standard deviation is about 3, so it would split as `[1,2,2], [8,9], [0,0,0,1,1,1]`

because the change 0 to 1 is not significant.