Jason - 1 year ago 68
R Question

# Return values from a Correlation Matrix in R

I have a correlation matrix (called

`correl`
)that is
`390 x 390`
so I would like to scan for values that are within
`0.80`
&
`0.99`
. I have written the following loop:

``````cc1 <- NA #creates a NA vector to store values between 0.80 & 0.99
cc2 <- NA #creates a NA vector to store desired values
p <- dim(correl)[2] #dim returns the size of the correlation matrix
i =1

while (i <= p) {
cc1 <- correl[,correl[,i] >=0.80 & correl[,i] < 1.00]
cc2<- cbind(cc2,cc1)
i <- i +1
}
``````

The problem I am having is that I also get undesired correlations ( those below 0.80) into
`cc2`
.

``````#Sample of what I mean:

1   SPY.Adjusted    1.0000000   0.83491778  0.6382930   0.8568000
2   AAPL.Adjusted   0.8349178   1.00000000  0.1945304   0.1194307
3   CHL.Adjusted    0.6382930   0.19453044  1.0000000   0.2991739
4   CVX.Adjusted    0.8568000   0.11943067  0.2991739   1.0000000
5   GE.Adjusted     0.6789054   0.13729877  0.3356743   0.5219169
6   GOOGL.Adjusted  0.5567947   0.10986655  0.2552149   0.2128337
``````

I only want to return the correlations within the desired range ( 0.80 & 0.99) without losing the
`row.names`
or
`col.names`
as I would not know which are which.

Let's create a simple reproducible example

``````m = matrix(runif(100), ncol=10)
rownames(m) = LETTERS[1:10]
colnames(m) = rownames(m)
``````

The tricky part is getting a nice return structure that contains the variable names. So I would collapse the matrix into a standard data frame

``````dd = data.frame(cor = as.vector(m1),
id1=rownames(m),
id2=rep(rownames(m), each=nrow(m)))
``````

Remove duplicate entries

``````dd = dd[as.vector(upper.tri(m, TRUE)),]
``````

Then select as usual

``````dd[dd\$cor > 0.8 & dd\$cor < 0.99,]
``````