dustin dustin - 7 months ago 33
Python Question

python: recursively find the distance between points in a group

I can apply

vincenty
in
geopy
to my
dataframe
in
pandas
and determine the distance between the two consecutive machines. However, I want to find the distance between all the machines in the group without repeating.

For example, if I group by company name and there are 3 machines associated with this company, I would want to find the distance between machine 1 and 2, 1 and 3, and (2 and 3) but not calculate the distance between (2 and 1) and (3 and 1) since they are symmetric (identical results).

import pandas as pd
from geopy.distance import vincenty

df = pd.DataFrame({'ser_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
'co_nm': ['aa', 'aa', 'aa', 'bb', 'bb', 'bb', 'bb', 'cc', 'cc', 'cc'],
'lat': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'lon': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]})

coord_col = ['lat', 'lon']
matching_cust = df['co_nm'] == df['co_nm'].shift(1)
shift_coords = df.shift(1).loc[matching_cust, coord_col]
# join in shifted coords and compute distance
df_shift = df.join(shift_coords, how = 'inner', rsuffix = '_2')
# return distance in miles
df['dist'] = df_shift.apply(lambda x: vincenty((x[1], x[2]),
(x[4], x[5])).mi, axis = 1)


This only finds the distance of consecutive machines in the group how can I expand on this to find the distance of all machines in the group?

This code returns:

co_nm lat lon ser_no dist
0 aa 1 21 1 NaN
1 aa 2 22 2 97.47832
2 aa 3 23 3 97.44923
3 bb 4 24 4 NaN
4 bb 5 25 5 97.34752
5 bb 6 26 6 97.27497
6 bb 7 27 7 97.18804
7 cc 8 28 8 NaN
8 cc 9 29 9 96.97129
9 cc 10 30 0 96.84163






Edit:


The desired output would find the unique distance combinations for machines related by company; that is, for
co_nm aa
we would have the distance between ser_no (1,2), (1,3), (2,3), (1,3) and the distance for the machines in
co_nm bb
and
cc
as well, but we wouldn't determine the distance of machines in different
co_nm
groups.

Does this make sense?

Answer

UPDATE2: using function:

def calc_dist(df):
    return pd.DataFrame(
               [ [grp,
                  df.loc[c[0]].ser_no,
                  df.loc[c[1]].ser_no,
                  vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
                 ]
                 for grp,lst in df.groupby('co_nm').groups.items()
                 for c in combinations(lst, 2)
               ],
               columns=['co_nm','machineA','machineB','distance'])

In [27]: calc_dist(df)
Out[27]:
   co_nm  machineA  machineB               distance
0     aa         1         2  156.87614939082016 km
1     aa         1         3   313.7054454472326 km
2     aa         2         3    156.829329105069 km
3     cc         8         9  156.06016539095216 km
4     cc         8         0   311.9109981692541 km
5     cc         9         0  155.85149813446617 km
6     bb         4         5  156.66564183673603 km
7     bb         4         6   313.2143330250297 km
8     bb         4         7   469.6225353388079 km
9     bb         5         6  156.54889741438788 km
10    bb         5         7  312.95759746593706 km
11    bb         6         7   156.4089967703544 km

UPDATE:

In [9]: dist = pd.DataFrame(
   ...:   [ [grp,
   ...:      df.loc[c[0]].ser_no,
   ...:      df.loc[c[1]].ser_no,
   ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
   ...:     ]
   ...:     for grp,lst in df.groupby('co_nm').groups.items()
   ...:     for c in combinations(lst, 2)
   ...:   ],
   ...:   columns=['co_nm','machineA','machineB','distance'])

In [10]: dist
Out[10]:
   co_nm  machineA  machineB               distance
0     aa         1         2  156.87614939082016 km
1     aa         1         3   313.7054454472326 km
2     aa         2         3    156.829329105069 km
3     cc         8         9  156.06016539095216 km
4     cc         8         0   311.9109981692541 km
5     cc         9         0  155.85149813446617 km
6     bb         4         5  156.66564183673603 km
7     bb         4         6   313.2143330250297 km
8     bb         4         7   469.6225353388079 km
9     bb         5         6  156.54889741438788 km
10    bb         5         7  312.95759746593706 km
11    bb         6         7   156.4089967703544 km

Explanation: combination part

In [11]: [c
   ....:  for grp,lst in df.groupby('co_nm').groups.items()
   ....:  for c in combinations(lst, 2)]
Out[11]:
[(0, 1),
 (0, 2),
 (1, 2),
 (7, 8),
 (7, 9),
 (8, 9),
 (3, 4),
 (3, 5),
 (3, 6),
 (4, 5),
 (4, 6),
 (5, 6)]

OLD answer:

In [3]: from itertools import combinations

In [4]: import pandas as pd

In [5]: from geopy.distance import vincenty

In [6]: df = pd.DataFrame({'machine': [1,2,3], 'lat': [11, 12, 13], 'lon': [21,22,23]})

In [7]: df
Out[7]:
   lat  lon  machine
0   11   21        1
1   12   22        2
2   13   23        3

In [8]: dist = pd.DataFrame(
   ...:   [ [df.loc[c[0]].machine,
   ...:      df.loc[c[1]].machine,
   ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])
   ...:     ]
   ...:     for c in combinations(df.index, 2)
   ...:   ],
   ...:   columns=['machineA','machineB','distance'])

In [9]: dist
Out[9]:
   machineA  machineB               distance
0         1         2   155.3664523771998 km
1         1         3   310.4557192973811 km
2         2         3  155.09044419651156 km
Comments