Robin Kramer Robin Kramer - 1 month ago 6
Python Question

Filling Array with subsequent value

In my dataframe I will end up with a column that only has few non-nan values. I want to use the non-nan values as grouping variables for all preceding rows that do contain a NaN value. To simulate it, I made the following array:

count = np.array([np.NaN,np.NaN,np.NaN,3,np.NaN,np.NaN,6,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,12])
count = Series(count)


For this array I was able to create a filling function

def pad_expsamp_time(array):
sect = np.zeros(array.size) # create array filled with zeros
inds = array.index[array.notnull()] # select the non-zero values
rev_inds = inds[::-1] # sort high to low
# fill array with value until index of value. Repeat for lower values.
for i in rev_inds:
sect[:i] = i
return Series(sect)


This function works, when it can assume that the indices of the non-nan values is equal to the actual values. However, how can I fill an array when the indices are not equal to the content?




For example, what if array count is:

count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,3])


And the desired output is

count = np.array([1,1,1,1,2,2,2,3,3,3,3,3,3]


It is possible that there are NaNs at the end of the array. I would like these to stay NaNs, so that the dataframe will ignore them.

count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,3,np.NaN,np.NaN])
# Will become:
count = np.array([1,1,1,1,2,2,2,3,3,3,np.nan,np.nan]

Answer

Here's a vectorized approach -

# Append False at either sides of NaN mask as we try to find start &
# stop of each NaN interval by looking for rising and falling edges
mask = np.hstack((False,np.isnan(count),False))
start = np.flatnonzero(mask[1:] > mask[:-1])
stop = np.flatnonzero(mask[1:] < mask[:-1])
lens = stop - start

# Account for NaNs if any at the end of input that might throw off stop values
stop = stop.clip(max=count.size-1)

# Assign values
count[mask[1:-1]] = count[stop].repeat(lens)

Sample runs -

Case #1 :

In [103]: count
Out[103]: 
array([ nan,  nan,  nan,   6.,  nan,  nan,   5.,  nan,  nan,  nan,  nan,
        nan,   2.])

In [104]:  # Listed code ...

In [105]: count
Out[105]: array([ 6.,  6.,  6.,  6.,  5.,  5.,  5.,  2.,  2.,  2.,  2.,  2.,  2.])

Case #2 :

In [118]: count
Out[118]: 
array([ nan,  nan,  nan,   1.,  nan,  nan,   2.,  nan,  nan,  nan,  nan,
        nan,   3.])

In [119]:   # Listed code ...

In [120]: count
Out[120]: array([ 1.,  1.,  1.,  1.,  2.,  2.,  2.,  3.,  3.,  3.,  3.,  3.,  3.])

Case #3 :

In [114]: count
Out[114]: 
array([ nan,  nan,  nan,   1.,  nan,  nan,   2.,  nan,  nan,   3.,  nan,
        nan])

In [115]:   # Listed code ...

In [116]: count
Out[116]: 
array([  1.,   1.,   1.,   1.,   2.,   2.,   2.,   3.,   3.,   3.,  nan,
        nan])