Jason - 2 months ago 13

R Question

I have a correlation matrix (called

`correl`

`390 x 390`

`0.80`

`0.99`

`cc1 <- NA #creates a NA vector to store values between 0.80 & 0.99`

cc2 <- NA #creates a NA vector to store desired values

p <- dim(correl)[2] #dim returns the size of the correlation matrix

i =1

while (i <= p) {

cc1 <- correl[,correl[,i] >=0.80 & correl[,i] < 1.00]

cc2<- cbind(cc2,cc1)

i <- i +1

}

The problem I am having is that I also get undesired correlations ( those below 0.80) into

`cc2`

`#Sample of what I mean:`

SPY.Adjusted AAPL.Adjusted CHL.Adjusted CVX.Adjusted

1 SPY.Adjusted 1.0000000 0.83491778 0.6382930 0.8568000

2 AAPL.Adjusted 0.8349178 1.00000000 0.1945304 0.1194307

3 CHL.Adjusted 0.6382930 0.19453044 1.0000000 0.2991739

4 CVX.Adjusted 0.8568000 0.11943067 0.2991739 1.0000000

5 GE.Adjusted 0.6789054 0.13729877 0.3356743 0.5219169

6 GOOGL.Adjusted 0.5567947 0.10986655 0.2552149 0.2128337

I only want to return the correlations within the desired range ( 0.80 & 0.99) without losing the

`row.names`

`col.names`

Answer

Let's create a simple reproducible example

```
m = matrix(runif(100), ncol=10)
rownames(m) = LETTERS[1:10]
colnames(m) = rownames(m)
```

The tricky part is getting a nice return structure that contains the variable names. So I would collapse the matrix into a standard data frame

```
dd = data.frame(cor = as.vector(m1),
id1=rownames(m),
id2=rep(rownames(m), each=nrow(m)))
```

Remove duplicate entries

```
dd = dd[as.vector(upper.tri(m, TRUE)),]
```

Then select as usual

```
dd[dd$cor > 0.8 & dd$cor < 0.99,]
```