SharkSandwich SharkSandwich - 5 days ago 5
Python Question

Creating a matrix of differences from a pandas DataFrame

I have a Pandas DataFrame [10 Rows and 1 Column (C)]

df = data.ix[0:9, 0]


I want to create a matrix which shows the distance between each of the elements [10 Rows and 10 Columns] like so:

C0 - C0 C1-C0 … Cn-C0
C0 - C1 C1-C1 … Cn-C1
C0 - C2 C1-C2 … Cn-C2
… … … … …
C0 - C9 C1-C9 … Cn-Cn


I know this could be done (inefficiently) with for two
for
loops. What would a better way be?

Answer

You can get the values of a DataFrame or Series into a NumPy array with the .values method, e.g.:

df.C.values

To create your matrix, you can subtract the array df.C from itself like this:

df.C.values - df.C.values[:, np.newaxis] # also see @immerrr's comment 

Which creates a NumPy array with the desired values.

For example:

>>> df = pd.DataFrame({'C': range(5)})   
   C
0  0
1  1
2  2
3  3
4  4

>>> df.C.values - df.C.values[:, np.newaxis]
array([[ 0,  1,  2,  3,  4],
       [-1,  0,  1,  2,  3],
       [-2, -1,  0,  1,  2],
       [-3, -2, -1,  0,  1],
       [-4, -3, -2, -1,  0]], dtype=int64)
Comments