Egil137 - 8 months ago 50

R Question

I am wondering how to find the pair of variables in a table that give the highest values.

For instance, I have this file "mydata" with 5 numeric columns. If I run

`cor(mydata)`

`sort(cor(mydata))`

PS: I'm not sure how to insert an example, I tried posting pictures but don't have the necessary points ¬¬

Let's say that if I have a table with 2 variables A and B, the output of sorting would be:

[1] 0.5 0.5 1.0 1.0

In this case it's easy to know that 0.5 comes from the pair A and B, but how could I know this when more than 2 variables are involved?

Answer

I think `which(..., arr.ind = TRUE)`

will help.

`which`

can take a vector, matrix, or array as an argument. By default (`arr.ind = FALSE`

), it simplifies the output into a vector, but if you instead set `arr.ind = TRUE`

(and the data has a `dim`

attribute, i.e., matrix, data.frame, or array), it will honor the dimensionality of the source data and tell you more precisely where to find the desired elements.

```
set.seed(42)
dat <- matrix(rbinom(25, 5, 0.5), ncol = 5)
which(dat > 3, arr.ind = TRUE)
## row col
## [1,] 1 1
## [2,] 2 1
## [3,] 4 1
## [4,] 3 3
## [5,] 1 4
## [6,] 2 4
## [7,] 1 5
## [8,] 3 5
## [9,] 4 5
```

Source (Stackoverflow)