tesseracT tesseracT - 1 year ago 88
R Question

Finding variance of a subset of data from a scatterplot

I have a scatterplot of x versus y. I have drawn an abline down the middle of the plot. I want to calculate the variance of the points on the left of the abline and I want to calculate the variance of the points on the right of the abline. This is most likely a relatively simple problem, but I'm struggling to find a solution. Any advice is appreciated. Thanks in advance.

x = rnorm(100,mean=12,sd=2)
y = rnorm(100,mean=20,sd=5)
data = as.data.frame(cbind(x,y))

Answer Source

In your sample code you have a vertical line v = 12. Your data points (x, y) are split into two groups as x < 12 and x >= 12. It is straightforward to do something like:

var(y[x < 12])
var(y[x >= 12])

But we can also use a single call to tapply:

tapply(y, x < 12, FUN = var)

More generally if you have a line y = a * x + b, where a is slope and b is intercept, your data points (x, y) will be split into two groups: y < a * x + b (below the line) and y >= a * x + b (above the line), so that you may use

tapply(y, y < a * x + b, FUN = var)
