JSH JSH - 2 months ago 19
R Question

R: filtering data and calculatin correlation automatically

I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).

cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

Is there any way to do each month automatically?

In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.

Thank you!

cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

gives you a 2 * 2 matrix rather than a number. If you do want a matrix for each Month, then use

lst <- lapply(split(airquality[, c("Temp", "Humidity")], airquality$Month), cor) 

so that you get a list, each of its element storing a matrix.

But if you want a single number for each Month, use

mapply(cor, with(airquality, split(Temp, Month)),
            with(airquality, split(Humidity, Month)))

so that you get a vector.

Reproducible example

The airquality dataset in R does not have Humidity column, so I will use Wind for testing:

x <- mapply(cor, with(airquality, split(Temp, Month)),
            with(airquality, split(Wind, Month)))

#         5          6          7          8          9 
#-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701 

We get a named vector, where names(x) gives Month, and unname(x) gives correlation.