W R - 1 year ago 87

Python Question

I have the following in a Pandas DataFrame in Python 2.7:

`Ser_Numb LAT LONG`

1 74.166061 30.512811

2 72.249672 33.427724

3 67.499828 37.937264

4 84.253715 69.328767

5 72.104828 33.823462

6 63.989462 51.918173

7 80.209112 33.530778

8 68.954132 35.981256

9 83.378214 40.619652

10 68.778571 6.607066

I am looking to calculate the distance between successive rows in the dataframe. The output should look something like this:

`Ser_Numb LAT LONG Distance`

1 74.166061 30.512811 0

2 72.249672 33.427724 d_between_Ser_Numb2 and Ser_Numb1

3 67.499828 37.937264 d_between_Ser_Numb3 and Ser_Numb2

4 84.253715 69.328767 d_between_Ser_Numb4 and Ser_Numb3

5 72.104828 33.823462 d_between_Ser_Numb5 and Ser_Numb4

6 63.989462 51.918173 d_between_Ser_Numb6 and Ser_Numb5

7 80.209112 33.530778 .

8 68.954132 35.981256 .

9 83.378214 40.619652 .

10 68.778571 6.607066 .

This post looks somewhat similar but it is calculating the distance between fixed points. I need the distance between successive points.

I tried to adapt this as follows:

`df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])`

df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))

df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))

df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

However, I get the following error:

`Traceback (most recent call last):`

File "C:\Python27\test.py", line 115, in <module>

df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper

"{0}".format(str(converter)))

TypeError: cannot convert the series to <type 'float'>

[Finished in 2.3s with exit code 1]

This error was fixed from MaxU's comment. With the fix, the output of this calculation is not making sense - the distance is nearly 8000 km:

`Ser_Numb LAT LONG LAT_rad LON_rad dLON dLAT distance`

0 1 74.166061 30.512811 1.294442 0.532549 NaN NaN NaN

1 2 72.249672 33.427724 1.260995 0.583424 0.574129 1.238402 8010.487211

2 3 67.499828 37.937264 1.178094 0.662130 0.651947 1.156086 7415.364469

3 4 84.253715 69.328767 1.470505 1.210015 1.198459 1.449943 9357.184623

4 5 72.104828 33.823462 1.258467 0.590331 0.569212 1.232802 7992.087820

5 6 63.989462 51.918173 1.116827 0.906143 0.895840 1.094862 7169.812123

6 7 80.209112 33.530778 1.399913 0.585222 0.569407 1.380421 8851.558260

7 8 68.954132 35.981256 1.203477 0.627991 0.617777 1.179044 7559.609520

8 9 83.378214 40.619652 1.455224 0.708947 0.697986 1.434220 9194.371978

9 10 68.778571 6.607066 1.200413 0.115315 0.102942 1.175014 NaN

According to:

- this online calculator: If I use Latitude1 = 74.166061,

Longitude1 = 30.512811, Latitude2 = 72.249672, Longitude2 = 33.427724

then I get 233 km - haversine function found

here as:then I`print haversine(30.512811, 74.166061, 33.427724, 72.249672)`

get 232.55 km

The answer should be 233 km, but my approach is giving ~8000 km. I think there is something wrong with how I am trying to iterate between successive rows.

Is there a way to do this in Pandas? Or do I need to loop through the dataframe one row at a time?

To create the above DF, select it and copy to clipboard. Then:

`import pandas as pd`

df = pd.read_clipboard()

print df

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

you can use this great solution (c) @ballsatballsdotballs (don't forget to upvote it ;-):

```
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
df['dist'] = \
haversine_np(df.LONG.shift(), df.LAT.shift(),
df.ix[1:, 'LONG'], df.ix[1:, 'LAT'])
```

Result:

```
In [566]: df
Out[566]:
Ser_Numb LAT LONG dist
0 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
2 3 67.499828 37.937264 554.905446
3 4 84.253715 69.328767 1981.896491
4 5 72.104828 33.823462 1513.397997
5 6 63.989462 51.918173 1164.481327
6 7 80.209112 33.530778 1887.256899
7 8 68.954132 35.981256 1252.531365
8 9 83.378214 40.619652 1606.340727
9 10 68.778571 6.607066 1793.921854
```

**UPDATE:** this will help to understand the logic:

```
In [573]: pd.concat([df['LAT'].shift(), df.ix[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
0 1
0 NaN NaN
1 74.166061 72.249672
2 72.249672 67.499828
3 67.499828 84.253715
4 84.253715 72.104828
5 72.104828 63.989462
6 63.989462 80.209112
7 80.209112 68.954132
8 68.954132 83.378214
9 83.378214 68.778571
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**