Jason Jason - 1 month ago 9
R Question

Return values from a Correlation Matrix in R

I have a correlation matrix (called

correl
)that is
390 x 390
so I would like to scan for values that are within
0.80
&
0.99
. I have written the following loop:

cc1 <- NA #creates a NA vector to store values between 0.80 & 0.99
cc2 <- NA #creates a NA vector to store desired values
p <- dim(correl)[2] #dim returns the size of the correlation matrix
i =1

while (i <= p) {
cc1 <- correl[,correl[,i] >=0.80 & correl[,i] < 1.00]
cc2<- cbind(cc2,cc1)
i <- i +1
}


The problem I am having is that I also get undesired correlations ( those below 0.80) into
cc2
.

#Sample of what I mean:

SPY.Adjusted AAPL.Adjusted CHL.Adjusted CVX.Adjusted
1 SPY.Adjusted 1.0000000 0.83491778 0.6382930 0.8568000
2 AAPL.Adjusted 0.8349178 1.00000000 0.1945304 0.1194307
3 CHL.Adjusted 0.6382930 0.19453044 1.0000000 0.2991739
4 CVX.Adjusted 0.8568000 0.11943067 0.2991739 1.0000000
5 GE.Adjusted 0.6789054 0.13729877 0.3356743 0.5219169
6 GOOGL.Adjusted 0.5567947 0.10986655 0.2552149 0.2128337


I only want to return the correlations within the desired range ( 0.80 & 0.99) without losing the
row.names
or
col.names
as I would not know which are which.

Answer

Let's create a simple reproducible example

m = matrix(runif(100), ncol=10)
rownames(m) = LETTERS[1:10]
colnames(m) = rownames(m)

The tricky part is getting a nice return structure that contains the variable names. So I would collapse the matrix into a standard data frame

dd = data.frame(cor = as.vector(m1), 
                     id1=rownames(m), 
                     id2=rep(rownames(m), each=nrow(m)))

Remove duplicate entries

dd = dd[as.vector(upper.tri(m, TRUE)),]

Then select as usual

dd[dd$cor > 0.8 & dd$cor < 0.99,]