wowdavers - 1 year ago 66
R Question

# Selecting specific elements in R for cor()

I'm in a situation where I need to find the correlation between two variables

`cor(dataframe\$x,dataframe\$y)`
, where
`x, y`
are column names and
`dataframe`
is a dataframe. One of the columns in my dataframe is an indicator function (0's and 1's).

I'm wondering how I can compare values of x and their corresponding values of y for two separate groups (0's and 1's). I'm new to R, so I guess I'm wondering if there's built in functionality into the
`cor()`
function, or if I have to reconstruct a dataframe/array with
`x's`
and
`y's`
to find the correlations for separate groups.

Guess that also leads to another question (which I've googled, it's not very clear cut to me yet): what's the difference between using a vector, array and dataframe in R under these functions (i.e.
`cor()`
,
`t.test()`
, etc.)?

Answer Source

You could compute the correlation on the subset of rows specified by the indicator column. To select a subset use `dataframe[logical_index,]` where `logical_index` is a vector of booleans (in R called logical). To do this you should convert the indicators to booleans.

``````logical_index <- as.logical(dataframe\$indicator)
cor(dataframe[logical_index,]\$x, dataframe[logical_index,]\$y)
cor(dataframe[!logical_index,]\$x, dataframe[!logical_index,]\$y)
``````

Vectors, matrixes, arrays, lists and data frames are all different primitive types of R. A clear and relative easy introduction to the differences is given by Hadley in Advanced R: http://adv-r.had.co.nz/Data-structures.html

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download