Misha V - 1 year ago 62

R Question

I have a matrix with a lot of missing values and I am trying to compute correlations between the columns.

To deal with the missing values, I use

`cor(matrix,use="complete")`

This gives a matrix with no NA values as desired. However, if I do a pairwise correlation between two of the columns A and B

`cor(matrix[,A],matrix[,B],use="complete")`

I get a different result than the one in the [A,B] entry in the matrix.

Looking a plot between the two variables, it seems like the second result is more reasonable.

Where could this discrepancy come from?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

You are asking the difference between `"complete.obs"`

and `"pairwise.complete.obs"`

.

```
## example matrix
set.seed(0);X <- matrix(rnorm(10*3),ncol=3)
X[1:2,1] <- NA
X[3:4,2] <- NA
X[5:6,3] <- NA
# [,1] [,2] [,3]
# [1,] NA 0.7635935 -0.22426789
# [2,] NA -0.7990092 0.37739565
# [3,] 1.329799263 NA 0.13333636
# [4,] 1.272429321 NA 0.80418951
# [5,] 0.414641434 -0.2992151 NA
# [6,] -1.539950042 -0.4115108 NA
# [7,] -0.928567035 0.2522234 1.08576936
# [8,] -0.294720447 -0.8919211 -0.69095384
# [9,] -0.005767173 0.4356833 -1.28459935
#[10,] 2.404653389 -1.2375384 0.04672617
## complete
cor(X, use = "complete.obs")
# [,1] [,2] [,3]
#[1,] 1.00000000 -0.69629279 -0.09773585
#[2,] -0.69629279 1.00000000 -0.01228196
#[3,] -0.09773585 -0.01228196 1.00000000
## pairwise
cor(X, use = "pairwise.complete.obs")
# [,1] [,2] [,3]
#[1,] 1.00000000 -0.5531396 0.08229729
#[2,] -0.55313958 1.0000000 -0.10786401
#[3,] 0.08229729 -0.1078640 1.00000000
```

For `use = "complete.obs"`

, any rows with at least one `NA`

will be dropped. So it essentially does

```
X1 <- X[7:10, ] ## only the last 4 rows have no `NA`
cor(X1)
# [,1] [,2] [,3]
#[1,] 1.00000000 -0.69629279 -0.09773585
#[2,] -0.69629279 1.00000000 -0.01228196
#[3,] -0.09773585 -0.01228196 1.00000000
```

Here, the `(1,2)`

or `(2,1)`

entry `-0.69629279`

is computed with only 4 data. However, if you do pairwise, it can be computed with 6 data:

```
cor(X[5:10, 1], X[5:10, 2])
# [1] -0.5531396
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**