wowdavers wowdavers - 8 months ago 37
R Question

Selecting specific elements in R for cor()

I'm in a situation where I need to find the correlation between two variables

, where
x, y
are column names and
is a dataframe. One of the columns in my dataframe is an indicator function (0's and 1's).

I'm wondering how I can compare values of x and their corresponding values of y for two separate groups (0's and 1's). I'm new to R, so I guess I'm wondering if there's built in functionality into the
function, or if I have to reconstruct a dataframe/array with
to find the correlations for separate groups.

Guess that also leads to another question (which I've googled, it's not very clear cut to me yet): what's the difference between using a vector, array and dataframe in R under these functions (i.e.
, etc.)?


You could compute the correlation on the subset of rows specified by the indicator column. To select a subset use dataframe[logical_index,] where logical_index is a vector of booleans (in R called logical). To do this you should convert the indicators to booleans.

logical_index <- as.logical(dataframe$indicator)
cor(dataframe[logical_index,]$x, dataframe[logical_index,]$y)
cor(dataframe[!logical_index,]$x, dataframe[!logical_index,]$y)

Vectors, matrixes, arrays, lists and data frames are all different primitive types of R. A clear and relative easy introduction to the differences is given by Hadley in Advanced R: