I am working on a homework assignment, and am not sure I understand the question. We are using the built-in
data <- iris[1:4]
scaled <- scale(data)
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
#Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
#Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
#Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
This prints out a massive output that I am not entirely sure what to do with.
I am just hoping someone can help explain the question, and point me in the correct direction.
dist takes distance between rows. Consider your scaled dataset:
x <- scale(data.matrix(iris[1:4]))
The squared Euclidean distance matrix between columns is
## I have used `c()` outside to coerce it into a plain vector d <- c(dist(t(x)) ^ 2) #  333.03580 38.21737 54.25354 425.67515 407.10553 11.06610
The lower triangular of correlation matrix is (we want lower triangular because the distance matrix is giving lower triangular part):
cx <- cor(x)[lower.tri(diag(4))] #  -0.1175698 0.8717538 0.8179411 -0.4284401 -0.3661259 0.9628654
We then just do what your question asks to compare:
d / (1 - cx) #  298 298 298 298 298 298
iris dataset has 150 rows, you should realize that
298 = 2 * (150 - 1).
I had no intention to post theoretical justification here. But the down vote irritates me and I am going to do it now.