wowdavers wowdavers - 2 months ago 9
R Question

Selecting specific elements in R for cor()

I'm in a situation where I need to find the correlation between two variables

cor(dataframe$x,dataframe$y)
, where
x, y
are column names and
dataframe
is a dataframe. One of the columns in my dataframe is an indicator function (0's and 1's).

I'm wondering how I can compare values of x and their corresponding values of y for two separate groups (0's and 1's). I'm new to R, so I guess I'm wondering if there's built in functionality into the
cor()
function, or if I have to reconstruct a dataframe/array with
x's
and
y's
to find the correlations for separate groups.

Guess that also leads to another question (which I've googled, it's not very clear cut to me yet): what's the difference between using a vector, array and dataframe in R under these functions (i.e.
cor()
,
t.test()
, etc.)?

Answer

You could compute the correlation on the subset of rows specified by the indicator column. To select a subset use dataframe[logical_index,] where logical_index is a vector of booleans (in R called logical). To do this you should convert the indicators to booleans.

logical_index <- as.logical(dataframe$indicator)
cor(dataframe[logical_index,]$x, dataframe[logical_index,]$y)
cor(dataframe[!logical_index,]$x, dataframe[!logical_index,]$y)

Vectors, matrixes, arrays, lists and data frames are all different primitive types of R. A clear and relative easy introduction to the differences is given by Hadley in Advanced R: http://adv-r.had.co.nz/Data-structures.html