user2117258 user2117258 - 2 months ago 23
R Question

Compute euclidean distance matrix from x,y,z coordinates

I have x, y, and z coordinate from a Principal Component Analysis that I would like to compute a euclidean distance matrix.

Test data:

X Y Z
samp_A -0.003467119 -0.01422762 -0.0101960126
samp_B -0.007279433 0.01651597 0.0045558849
samp_C -0.005392258 0.02149997 0.0177409387
samp_D -0.017898802 0.02790659 0.0006487222
samp_E -0.013564214 0.01835688 0.0008102952
samp_F -0.013375397 0.02210725 -0.0286032185


I would ultimately like to return a table in the following format:

A B ...
A 0 0.2 ...
B 0.2 0 ...
... ... ... ...
... ... ... ...


Obviously the distance data above is fake. The X, Y and Z data is simply a head of the full dataset. The full dataset consists of about 4000 entires. I assume this would need to be done is an efficient manner. If it's easier, then computing the nearest distances of, say 10 points, could suffice (remaining points would be NA or 0).

Any help would be much appreciated!

EDIT: A suggestion arose to use
dist
but I do not believe this allow for three coordinates. IF i were to use dist the results seem to be nonsense(?).

> pca_coords_dist <- dist(pca_coords)
> head(pca_coords_dist)
[1] 0.03431210 0.04539427 0.04583855 0.03584466 0.04191922 0.04291657


I believe one way to go about this is to create a function to compute distance and apply it to each row in a pairwise manner. I think this is a correct function to compute distance in three dimensions.

euc.dist.3 <- function(x1, x2, y1, y2, z1, z2 ) sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 )


If I apply this to sampA and sampB the results is 1.56643.

Now, is there a way to apply this function to every pairwise row? and format the output to a distance matrix?

Answer

try ? dist in R:

distance.matrix <- dist(yourData, method = "euclidean", diag = T) 

In the code above, yourData is your data.frame or matrix