Kerri Colman - 11 months ago 47

R Question

I am trying to do a KS plot in r and all seems to be going well - except for the fact that i can only use colour to visualise the two different samples and not line type.

I have tried the following:

`sample1<-SD13009`

sample2<-SD13009PB

group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2)))

dat <- data.frame(KSD = c(sample1,sample2), group = group)

cdf1 <- ecdf(sample1)

cdf2 <- ecdf(sample2)

minMax <- seq(min(sample1, sample2), max(sample1, sample2), length.out=length(sample1))

x0 <- minMax[which( abs(cdf1(minMax) - cdf2(minMax)) == max(abs(cdf1(minMax) - cdf2(minMax))) )]

y0 <- cdf1(x0)

y1 <- cdf2(x0)

#attempt 1

`plot<-ggplot(dat, aes(x = KSD, group = group, colour = group, linetype=group))+`

stat_ecdf(size=1) +

mytheme + xlab("mm") +scale_x_continuous(limits=c(0,1))+

ylab("Cumulitive Distibution") +

#geom_line(aes(group=group,size=1)) +

geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),

linetype = "dashed", color = "red") +

geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +

geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +

ggtitle("K-S Test: Sample 1 / Sample 2")

#attempt 2

`cdf <- ggplot(dat, aes(x=KSD, group=group,linetype=group)) + stat_ecdf(aes(linetype=group)) + coord_cartesian(xlim = c(0, 0.8)) + geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),`

linetype = "dashed", color = "red") +

geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +

geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +

ggtitle("K-S Test: Sample 1 / Sample 2")

This is what i get:

Answer Source

I cannot reproduce this, with the following code:

```
# Make two random samples
sample1<-rnorm(1000)
sample2<-rnorm(1000, 2, 2)
group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2)))
dat <- data.frame(KSD = c(sample1,sample2), group = group)
cdf1 <- ecdf(sample1)
cdf2 <- ecdf(sample2)
minMax <- seq(min(sample1, sample2), max(sample1, sample2), length.out=length(sample1))
x0 <- minMax[which( abs(cdf1(minMax) - cdf2(minMax)) == max(abs(cdf1(minMax) - cdf2(minMax))) )]
y0 <- cdf1(x0)
y1 <- cdf2(x0)
ggplot(dat, aes(x = KSD, group = group, colour = group, linetype=group))+
stat_ecdf(size=1) +
xlab("mm") +
ylab("Cumulitive Distibution") +
geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),
linetype = "dashed", color = "red") +
geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +
geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +
ggtitle("K-S Test: Sample 1 / Sample 2")
```

It seems that in your plot the lines are so close together that you can't see that they are different linetypes, but they are.