jeremy radcliff - 1 year ago 71

Python Question

I'm using sklearn's pairwise distance function, which saved my life when computing a huge matrix, but the problem I'm having is that I lose my indices.

Specifically, I initially have a huge dataframe of 17000 x 300, which I break down into 4 different dataframes based on some class condition.

The 4 separate dataframes keep the original indices, but after I run the pairwise distance function on one of those dataframes, it gives me back a 2d array with correct values but the indices have been reset from 0 up.

**How do I keep or recover the original indices**?

`distance1 = pair.pairwise_distances(df1, metric='euclidean')`

Answer Source

You can create a DataFrame with matching indices using the DataFrame constructor taking the `index`

parameter:

```
pd.DataFrame(distance1, index=df1.index)
```

Furthermore, if you would like to concatenate it horizontally to your existing DataFrame, you can use

```
pd.concat((df1, pd.DataFrame(distance1, index=df1.index)), axis=1)
```