Jeremy Jeremy - 1 month ago 8
Python Question

Creating a Distance Matrix?

I am currently reading in data into a dataframe that looks like this.

City XCord YCord
Boston 5 2
Phoenix 7 3
New York 8 1
..... . .


I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:

Boston Phoenix New York
Boston 0 2.236 3.162
Phoenix 2.236 0 2.236
New York 3.162 2.236 0


There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.

Answer Source

I think you are intrested in distance_matrix.

For example:

Create data:

import pandas as pd
from scipy.spatial import distance_matrix

data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)

Output:

          xcord ycord
Boston      5   7
Phoenix     7   3
New York    8   1

Using the distance matrix function:

 pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)

Results:

          Boston    Phoenix     New York
Boston    0.000000  4.472136    6.708204
Phoenix   4.472136  0.000000    2.236068
New York  6.708204  2.236068    0.000000