Is there a quick way of replacing all NaN values in a numpy array with (say) the linearly interpolated values?
[1 1 1 nan nan 2 2 nan 0]
[1 1 1 1.3 1.6 2 2 1 0]
Lets define first a simple helper function in order to make it more straightforward to handle indices and logical indices of NaNs:
import numpy as np def nan_helper(y): """Helper to handle indices and logical indices of NaNs. Input: - y, 1d numpy array with possible NaNs Output: - nans, logical indices of NaNs - index, a function, with signature indices= index(logical_indices), to convert logical indices of NaNs to 'equivalent' indices Example: >>> # linear interpolation of NaNs >>> nans, x= nan_helper(y) >>> y[nans]= np.interp(x(nans), x(~nans), y[~nans]) """ return np.isnan(y), lambda z: z.nonzero()
nan_helper(.) can now be utilized like:
>>> y= array([1, 1, 1, NaN, NaN, 2, 2, NaN, 0]) >>> >>> nans, x= nan_helper(y) >>> y[nans]= np.interp(x(nans), x(~nans), y[~nans]) >>> >>> print y.round(2) [ 1. 1. 1. 1.33 1.67 2. 2. 1. 0. ]
Although it may seem first a little bit overkill to specify a separate function to do just things like this:
>>> nans, x= np.isnan(y), lambda z: z.nonzero()
it will eventually pay dividends.
So, whenever you are working with NaNs related data, just encapsulate all the (new NaN related) functionality needed, under some specific helper function(s). Your code base will be more coherent and readable, because it follows easily understandable idioms.
Interpolation, indeed, is a nice context to see how NaN handling is done, but similar techniques are utilized in various other contexts as well.