nathanbeagle -4 years ago 177
R Question

# R Univariate Clustering by Group

I am trying to find a method to cluster univariate data by group. For example, in the data below I have two failure codes (a and b) and 6 data points for each grouping. In the plot you can see that for each failure code there are 2 distinct clusters for failure time. Manually this isn't bad, but I can't figure out how to do this with a larger data set (~100K rows and ~30 codes). I would like for the end result to give me the medoid for each cluster and the count of codes in that cluster.

``````library(ggplot2)
failure <- rep(c("a","b"),each=6)
ttf <- c(1,1.5,2,5,5.5,6,8,8.5,9,14,14.5,15)
data <- data.frame(failure,ttf)
qplot(failure, ttf)
results <- data.frame(failure = c("a","b"), m1 = c(1.5,8.5), m2 = c(5.5,14.5))
``````

I would like for the end result to give me something like the table below.

``````failure m1   m1count  m2    m2count
a       1.5  3        5.5   3
b       8.5  3        14.5  3
``````

This is will do what you want, assuming only two clusters per failure group, though you could change it in the `tapply` it would apply to all failure groups.

``````res2 <- tapply(data\$ttf, INDEX = data\$failure, function(x) kmeans(x,2))
res3 <- lapply(names(res2), function(x) data.frame(failure=x, Centers=res2[[x]]\$centers, Size=res2[[x]]\$size))
res3 <- do.call(rbind, res3)

res3
failure Centers Size
1        a     5.5    3
2        a     1.5    3
11       b    14.5    3
21       b     8.5    3
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download