Egil137 Egil137 - 1 month ago 12
R Question

Finding top values in a table in R

I am wondering how to find the pair of variables in a table that give the highest values.

For instance, I have this file "mydata" with 5 numeric columns. If I run

cor(mydata)
it will show me all the possible correlations. I want to know those pairs that are highly correlated. I tried using
sort(cor(mydata))
, but understandably this gives me a vector of the ordered values. How can I then know what pair is responsible for a certain value?

PS: I'm not sure how to insert an example, I tried posting pictures but don't have the necessary points ¬¬

Let's say that if I have a table with 2 variables A and B, the output of sorting would be:

[1] 0.5 0.5 1.0 1.0

In this case it's easy to know that 0.5 comes from the pair A and B, but how could I know this when more than 2 variables are involved?

Answer

I think which(..., arr.ind = TRUE) will help.

which can take a vector, matrix, or array as an argument. By default (arr.ind = FALSE), it simplifies the output into a vector, but if you instead set arr.ind = TRUE (and the data has a dim attribute, i.e., matrix, data.frame, or array), it will honor the dimensionality of the source data and tell you more precisely where to find the desired elements.

set.seed(42)
dat <- matrix(rbinom(25, 5, 0.5), ncol = 5)
which(dat > 3, arr.ind = TRUE)
##       row col
##  [1,]   1   1
##  [2,]   2   1
##  [3,]   4   1
##  [4,]   3   3
##  [5,]   1   4
##  [6,]   2   4
##  [7,]   1   5
##  [8,]   3   5
##  [9,]   4   5
Comments