Caragh - 8 months ago 29

R Question

I have two dataframes, one is a list of pairs of individuals, similar to below (but with about 150 pairs):

`ID_1 ID_2`

X14567 X26789

X12637 X34560

X67495 X59023

The other dataframe consists of once column per individual with numerical values relating to that individuals underneath. All told about 300 columns and 300 rows. For example:

`X14567 X12637 X26789 X67495 X34560 X59023`

0.41 0.29 0.70 0.83 0.41 0.30

0.59 0.44 0.20 0.94 0.03 0.97

0.48 0.91 0.78 0.92 0.40 0.09

0.07 0.21 0.42 0.14 0.96 0.96

0.33 0.13 0.53 0.04 0.52 0.49

0.94 0.28 0.37 0.26 0.11 0.09

I want to find the correlation of these values between each pair of individuals. to end up with something like:

`ID_1 ID_2 Correlation`

X14567 X26789 -0.25

X12637 X34560 -0.25

X67495 X59023 -0.11

Is there a way that I can pull the values from the first dataframe to specify the name of the two columns that I need to find correlations between in such a way that can be easily repeated for each row of the first dataframe?

Many thanks for your help

Answer

If x and y are your two data.frames and the column names are set appropriately, you can use `apply`

.

```
apply(x, 1, function(row) cor(y[row[1]], y[row[2]]))
```

From there just add the values to your x data.frame:

```
x$cor <- apply(x, 1, function(row) cor(y[row[1]], y[row[2]]))
V1 V2 cor
2 X14567 X26789 -0.2515737
3 X12637 X34560 -0.2563294
4 X67495 X59023 -0.1092830
```