VDFerreira VDFerreira - 2 months ago 16
R Question

r - plot outliers in kmeans plot

i am trying to make an analysis using kmeans.

i have a dataset:

head(data)

tstamp elementid value hours


2016-09-15 15:20:28 IN_TEMP 25.12237 15

2016-09-15 15:20:29 IN_TEMP 25.44952 15

2016-09-15 15:20:29 IN_TEMP 25.53550 15

2016-09-15 15:20:39 IN_PRESSURE 101.40683 15

2016-09-15 15:20:49 IN_TEMP 25.94596 15

2016-09-15 15:20:49 IN_TEMP 25.38742 15

so i made this:

dataCluster<- kmeans(data[, 3:4], 2, nstart = 20)
dataCluster$cluster <- as.factor(dataCluster$cluster)
levels(dataCluster$cluster) <- c("IN_TEMP", "IN_PRESSURE")

ggplot(data, aes(value, hours, color = dataCluster$cluster)) + geom_point()


and the result is:

enter image description here

it is ok for my but when i make:

table(dataCluster$cluster, data$elementid)

IN_PRESSURE | IN_TEMP


IN_TEMP | 0 | 953

IN_PRESSURE | 508 | 44

I have 44 values on 2nd cluster that are IN_TEMP values (1st cluster).

Can i paint these 44 values with the color of the 1st cluster (red color) ?

Thanks by your help
Greetings

Answer

If i got it correctly, it's not the cluster label by which you want to color, rather you want to color by the variable elementid. You can simply use the following:

ggplot(data, aes(value, hours, color = elementid)) + geom_point()

Does that help?

Comments