user2117258 - 4 months ago 42

R Question

I have x, y, and z coordinate from a Principal Component Analysis that I would like to compute a euclidean distance matrix.

Test data:

`X Y Z`

samp_A -0.003467119 -0.01422762 -0.0101960126

samp_B -0.007279433 0.01651597 0.0045558849

samp_C -0.005392258 0.02149997 0.0177409387

samp_D -0.017898802 0.02790659 0.0006487222

samp_E -0.013564214 0.01835688 0.0008102952

samp_F -0.013375397 0.02210725 -0.0286032185

I would ultimately like to return a table in the following format:

`A B ...`

A 0 0.2 ...

B 0.2 0 ...

... ... ... ...

... ... ... ...

Obviously the distance data above is fake. The X, Y and Z data is simply a head of the full dataset. The full dataset consists of about 4000 entires. I assume this would need to be done is an efficient manner. If it's easier, then computing the nearest distances of, say 10 points, could suffice (remaining points would be NA or 0).

Any help would be much appreciated!

EDIT: A suggestion arose to use

`dist`

`> pca_coords_dist <- dist(pca_coords)`

> head(pca_coords_dist)

[1] 0.03431210 0.04539427 0.04583855 0.03584466 0.04191922 0.04291657

I believe one way to go about this is to create a function to compute distance and apply it to each row in a pairwise manner. I

`euc.dist.3 <- function(x1, x2, y1, y2, z1, z2 ) sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 )`

If I apply this to sampA and sampB the results is 1.56643.

Now, is there a way to apply this function to every pairwise row? and format the output to a distance matrix?

Answer

try `? dist`

in R:

```
distance.matrix <- dist(yourData, method = "euclidean", diag = T)
```

In the code above, *yourData* is your **data.frame** or **matrix**