tesseracT - 17 days ago 5x
R Question

# Finding variance of a subset of data from a scatterplot

I have a scatterplot of x versus y. I have drawn an abline down the middle of the plot. I want to calculate the variance of the points on the left of the abline and I want to calculate the variance of the points on the right of the abline. This is most likely a relatively simple problem, but I'm struggling to find a solution. Any advice is appreciated. Thanks in advance.

``````    x = rnorm(100,mean=12,sd=2)
y = rnorm(100,mean=20,sd=5)
data = as.data.frame(cbind(x,y))
plot(x=x,y=y,type="p")
abline(v=12,col="red")
``````

In your sample code you have a vertical line `v = 12`. Your data points `(x, y)` are split into two groups as `x < 12` and `x >= 12`. It is straightforward to do something like:

``````var(y[x < 12])
var(y[x >= 12])
``````

But we can also use a single call to `tapply`:

``````tapply(y, x < 12, FUN = var)
``````

More generally if you have a line `y = a * x + b`, where `a` is slope and `b` is intercept, your data points `(x, y)` will be split into two groups: `y < a * x + b` (below the line) and `y >= a * x + b` (above the line), so that you may use

``````tapply(y, y < a * x + b, FUN = var)
``````