Mike T - 1 year ago 101
Python Question

# NumPy: calculate averages with NaNs removed

How can I calculate matrix mean values along a matrix, but to remove

`nan`
values from calculation? (For R people, think
`na.rm = TRUE`
).

Here is my [non-]working example:

``````import numpy as np
dat = np.array([[1, 2, 3],
[4, 5, np.nan],
[np.nan, 6, np.nan],
[np.nan, np.nan, np.nan]])
print(dat)
print(dat.mean(1))  # [  2.  nan  nan  nan]
``````

With NaNs removed, my expected output would be:

``````array([ 2.,  4.5,  6.,  nan])
``````

I think what you want is a masked array:

``````dat = np.array([[1,2,3], [4,5,nan], [nan,6,nan], [nan,nan,nan]])
mm = np.mean(mdat,axis=1)
print mm.filled(np.nan) # the desired answer
``````

Edit: Combining all of the timing data

``````   from timeit import Timer

setupstr="""
import numpy as np
from scipy.stats.stats import nanmean
dat = np.random.normal(size=(1000,1000))
ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50))
dat[ii] = np.nan
"""

method1="""
mm = np.mean(mdat,axis=1)
mm.filled(np.nan)
"""

N = 2
t1 = Timer(method1, setupstr).timeit(N)
t2 = Timer("[np.mean([l for l in d if not np.isnan(l)]) for d in dat]", setupstr).timeit(N)
t3 = Timer("np.array([r[np.isfinite(r)].mean() for r in dat])", setupstr).timeit(N)
t5 = Timer("nanmean(dat,axis=1)", setupstr).timeit(N)

print 'Time: %f\tRatio: %f' % (t1,t1/t1 )
print 'Time: %f\tRatio: %f' % (t2,t2/t1 )
print 'Time: %f\tRatio: %f' % (t3,t3/t1 )
print 'Time: %f\tRatio: %f' % (t4,t4/t1 )
print 'Time: %f\tRatio: %f' % (t5,t5/t1 )
``````

Returns:

``````Time: 0.045454  Ratio: 1.000000
Time: 8.179479  Ratio: 179.950595
Time: 0.060988  Ratio: 1.341755
Time: 0.070955  Ratio: 1.561029
Time: 0.065152  Ratio: 1.433364
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download