JSH - 5 months ago 38

R Question

I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).

`cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])`

Is there any way to do each month automatically?

In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.

Thank you!

Answer

```
cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])
```

gives you a `2 * 2`

covariance matrix rather than a number. I bet you want a single number for each `Month`

, so use

```
## cor(Temp, Humidity | Month)
with(airquality, mapply(cor, split(Temp, Month), split(Humidity, Month)) )
```

and you will obtain a vector.

Have a read around `?split`

and `?mapply`

; they are very useful for "by group" operations, although they are not the only option. Also read around `?cor`

, and compare the difference between

```
a <- rnorm(10)
b <- rnorm(10)
cor(a, b)
cor(cbind(a, b))
```

The answer you linked in your question is doing something similar to `cor(cbind(a, b))`

.

**Reproducible example**

The `airquality`

dataset in R does not have `Humidity`

column, so I will use `Wind`

for testing:

```
## cor(Temp, Wind | Month)
x <- with(airquality, mapply(cor, split(Temp, Month), split(Wind, Month)) )
# 5 6 7 8 9
#-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701
```

We get a named vector, where `names(x)`

gives `Month`

, and `unname(x)`

gives correlation.

Thank you very much! It worked just perfectly! I was trying to figure out how to obtain a vector with the

`R^2`

for each correlation too, but I can't... Any ideas?

`cor(x, y)`

is like fitting a standardised linear regression model:

```
coef(lm(scale(y) ~ scale(x) - 1)) ## remember to drop intercept
```

The R-squared in this simple linear regression is just the square of the slope. Previously we have `x`

storing correlation per group, now R-squared is just `x ^ 2`

.