Georges Hb - 1 month ago 5x
Python Question

# compute the difference of all possible rows

Based on a selection

`ds`
of a dataframe
`d`
with:

`{ 'x': d.x, 'y': d.y, 'a':d.a, 'b':d.b, 'c':d.c 'row:d.n'})`

Having
`n`
rows,
`x`
ranges from
`0`
to
`n-1`
. The column
`n`
is needed since it's a selection and indices need to be kept for a later query.

How do you efficiently compute the difference between each row (e.g.
`a_0, a_1, etc`
) of each column (
`a, b, c`
) without losing the rows information (e.g. new column with the indices of the rows that were used) ?

MWE

Sample selection
`ds`
:

``````             x           y      a     b      c     n

554.607085  400.971878   9789  4151   6837   146
512.231450  405.469524   8796  3811   6596   225
570.427284  694.369140   1608  2019   2097   291
``````

Desired output:

`dist`
euclidean distance
`math.hypot(x2 - x1, y2 - y1)`

`da, db, dc`
for
`da: np.abs(a1-a2)`

`ns`
a string with both
`n`
s of the employed rows

the result would look like:

``````             dist          da        db       dc         ns
42.61365102824963         993       340      241    146-225
293.82347069813255       8181      2132     4740    146-291
..         ..        ..       ..    225-291
``````

You can use `itertools.combinations()` to generate the pairs:

``````import pandas as pd
from io import StringIO
import numpy as np

text = """             x           y      a     b      c     n
554.607085  400.971878   9789  4151   6837   146
512.231450  405.469524   8796  3811   6596   225
570.427284  694.369140   1608  2019   2097   291"""

``````

Create the index and calculate the results:

``````from itertools import combinations

index = np.array(list(combinations(range(df.shape[0]), 2)))

df1, df2 = [df.iloc[idx].reset_index(drop=True) for idx in index.T]

res = pd.concat([
np.hypot(df1.x - df2.x, df1.y - df2.y),
df1[["a", "b", "c"]] - df2[["a", "b", "c"]],
df1.n.astype(str) + "-" + df2.n.astype(str)
], axis=1)

res.columns = ["dist", "da", "db", "dc", "ns"]
res
``````

the output:

``````         dist    da    db    dc       ns
0   42.613651   993   340   241  146-225
1  293.823471  8181  2132  4740  146-291
2  294.702805  7188  1792  4499  225-291
``````
Source (Stackoverflow)