Georges Hb Georges Hb - 3 months ago 11
Python Question

compute the difference of all possible rows

Based on a selection

ds
of a dataframe
d
with:

{ 'x': d.x, 'y': d.y, 'a':d.a, 'b':d.b, 'c':d.c 'row:d.n'})


Having
n
rows,
x
ranges from
0
to
n-1
. The column
n
is needed since it's a selection and indices need to be kept for a later query.

How do you efficiently compute the difference between each row (e.g.
a_0, a_1, etc
) of each column (
a, b, c
) without losing the rows information (e.g. new column with the indices of the rows that were used) ?

MWE

Sample selection
ds
:

x y a b c n

554.607085 400.971878 9789 4151 6837 146
512.231450 405.469524 8796 3811 6596 225
570.427284 694.369140 1608 2019 2097 291


Desired output:

dist
euclidean distance
math.hypot(x2 - x1, y2 - y1)


da, db, dc
for
da: np.abs(a1-a2)


ns
a string with both
n
s of the employed rows

the result would look like:

dist da db dc ns
42.61365102824963 993 340 241 146-225
293.82347069813255 8181 2132 4740 146-291
.. .. .. .. 225-291

Answer

You can use itertools.combinations() to generate the pairs:

Read data first:

import pandas as pd
from io import StringIO
import numpy as np

text = """             x           y      a     b      c     n
    554.607085  400.971878   9789  4151   6837   146
    512.231450  405.469524   8796  3811   6596   225
    570.427284  694.369140   1608  2019   2097   291"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

Create the index and calculate the results:

from itertools import combinations

index = np.array(list(combinations(range(df.shape[0]), 2)))

df1, df2 = [df.iloc[idx].reset_index(drop=True) for idx in index.T]

res = pd.concat([
    np.hypot(df1.x - df2.x, df1.y - df2.y),
    df1[["a", "b", "c"]] - df2[["a", "b", "c"]],
    df1.n.astype(str) + "-" + df2.n.astype(str)
], axis=1)

res.columns = ["dist", "da", "db", "dc", "ns"]
res

the output:

         dist    da    db    dc       ns
0   42.613651   993   340   241  146-225
1  293.823471  8181  2132  4740  146-291
2  294.702805  7188  1792  4499  225-291