Thomas Matthew Thomas Matthew - 20 days ago 5
Python Question

Set diagonal triangle in pandas DataFrame to NaN

Given the below dataframe:

import pandas as pd
import numpy as np
a = np.arange(16).reshape(4, 4)
df = pd.DataFrame(data=a, columns=['a','b','c','d'])


I'd like to produce the following result:

df([[ NaN, 1, 2, 3],
[ NaN, NaN, 6, 7],
[ NaN, NaN, NaN, 11],
[ NaN, NaN, NaN, NaN]])


So far I've tried using
np.tril_indicies
, but it only works with a df turned back into a numpy array, and it only works for integer assignments (not np.nan):

il1 = np.tril_indices(4)
a[il1] = 0


gives:

array([[ 0, 1, 2, 3],
[ 0, 0, 6, 7],
[ 0, 0, 0, 11],
[ 0, 0, 0, 0]])


...which is almost what I'm looking for, but barfs at assigning NaN:

ValueError: cannot convert float NaN to integer


while:

df[il1] = 0


gives:

TypeError: unhashable type: 'numpy.ndarray'


So if I want to fill the bottom triangle of a dataframe with NaN, does it 1) have to be a numpy array, or can I do this with pandas directly? And 2) Is there a way to fill bottom triangle with NaN rather than using
numpy.fill_diagonal
and incrementing the offset row by row down the whole DataFrame?

Another failed solution:
Filling the diagonal of np array with zeros, then masking on zero and reassigning to np.nan. It converts zero values above the diagonal as NaN when they should be preserved as zero!

Answer

You need cast to float a, because type of NaN is float:

import numpy as np
a = np.arange(16).reshape(4, 4).astype(float)
print (a)
[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


il1 = np.tril_indices(4)
a[il1] = np.nan
print (a)
[[ nan   1.   2.   3.]
 [ nan  nan   6.   7.]
 [ nan  nan  nan  11.]
 [ nan  nan  nan  nan]]

df = pd.DataFrame(data=a, columns=['a','b','c','d'])
print (df)
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN