tesseracT - 5 months ago 38

R Question

I have a scatterplot of x versus y. I have drawn an abline down the middle of the plot. I want to calculate the variance of the points on the left of the abline and I want to calculate the variance of the points on the right of the abline. This is most likely a relatively simple problem, but I'm struggling to find a solution. Any advice is appreciated. Thanks in advance.

`x = rnorm(100,mean=12,sd=2)`

y = rnorm(100,mean=20,sd=5)

data = as.data.frame(cbind(x,y))

plot(x=x,y=y,type="p")

abline(v=12,col="red")

Answer

In your sample code you have a vertical line `v = 12`

. Your data points `(x, y)`

are split into two groups as `x < 12`

and `x >= 12`

. It is straightforward to do something like:

```
var(y[x < 12])
var(y[x >= 12])
```

But we can also use a single call to `tapply`

:

```
tapply(y, x < 12, FUN = var)
```

More generally if you have a line `y = a * x + b`

, where `a`

is slope and `b`

is intercept, your data points `(x, y)`

will be split into two groups: `y < a * x + b`

(below the line) and `y >= a * x + b`

(above the line), so that you may use

```
tapply(y, y < a * x + b, FUN = var)
```