jeremy radcliff jeremy radcliff - 10 months ago 54
Python Question

How to maintain or recover Dataframe indexing after running Pairwise Distance function?

I'm using sklearn's pairwise distance function, which saved my life when computing a huge matrix, but the problem I'm having is that I lose my indices.

Specifically, I initially have a huge dataframe of 17000 x 300, which I break down into 4 different dataframes based on some class condition.
The 4 separate dataframes keep the original indices, but after I run the pairwise distance function on one of those dataframes, it gives me back a 2d array with correct values but the indices have been reset from 0 up.

How do I keep or recover the original indices?

distance1 = pair.pairwise_distances(df1, metric='euclidean')

Answer Source

You can create a DataFrame with matching indices using the DataFrame constructor taking the index parameter:

pd.DataFrame(distance1, index=df1.index)

Furthermore, if you would like to concatenate it horizontally to your existing DataFrame, you can use

pd.concat((df1, pd.DataFrame(distance1, index=df1.index)), axis=1)