JSH - 6 months ago 43
R Question

# R: filtering data and calculatin correlation automatically

I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).

``````cor(airquality[airquality\$Month == 1, c("Temp", "Humidity")])
``````

Is there any way to do each month automatically?

In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.

Thank you!

``````cor(airquality[airquality\$Month == 1, c("Temp", "Humidity")])
``````

gives you a `2 * 2` matrix rather than a number. If you do want a matrix for each `Month`, then use

``````lst <- lapply(split(airquality[, c("Temp", "Humidity")], airquality\$Month), cor)
``````

so that you get a list, each of its element storing a matrix.

But if you want a single number for each `Month`, use

``````mapply(cor, with(airquality, split(Temp, Month)),
with(airquality, split(Humidity, Month)))
``````

so that you get a vector.

Reproducible example

The `airquality` dataset in R does not have `Humidity` column, so I will use `Wind` for testing:

``````x <- mapply(cor, with(airquality, split(Temp, Month)),
with(airquality, split(Wind, Month)))

#         5          6          7          8          9
#-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701
``````

We get a named vector, where `names(x)` gives `Month`, and `unname(x)` gives correlation.