Robin Kramer - 9 months ago 40

Python Question

In my dataframe I will end up with a column that only has few non-nan values. I want to use the non-nan values as grouping variables for all preceding rows that do contain a NaN value. To simulate it, I made the following array:

`count = np.array([np.NaN,np.NaN,np.NaN,3,np.NaN,np.NaN,6,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,12])`

count = Series(count)

For this array I was able to create a filling function

`def pad_expsamp_time(array):`

sect = np.zeros(array.size) # create array filled with zeros

inds = array.index[array.notnull()] # select the non-zero values

rev_inds = inds[::-1] # sort high to low

# fill array with value until index of value. Repeat for lower values.

for i in rev_inds:

sect[:i] = i

return Series(sect)

This function works, when it can assume that the indices of the non-nan values is equal to the actual values. However,

For example, what if array count is:

`count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,3])`

And the desired output is

`count = np.array([1,1,1,1,2,2,2,3,3,3,3,3,3]`

It is possible that there are NaNs at the end of the array. I would like these to stay NaNs, so that the dataframe will ignore them.

`count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,3,np.NaN,np.NaN])`

# Will become:

count = np.array([1,1,1,1,2,2,2,3,3,3,np.nan,np.nan]

Answer Source

Here's a vectorized approach -

```
# Append False at either sides of NaN mask as we try to find start &
# stop of each NaN interval by looking for rising and falling edges
mask = np.hstack((False,np.isnan(count),False))
start = np.flatnonzero(mask[1:] > mask[:-1])
stop = np.flatnonzero(mask[1:] < mask[:-1])
lens = stop - start
# Account for NaNs if any at the end of input that might throw off stop values
stop = stop.clip(max=count.size-1)
# Assign values
count[mask[1:-1]] = count[stop].repeat(lens)
```

Sample runs -

Case #1 :

```
In [103]: count
Out[103]:
array([ nan, nan, nan, 6., nan, nan, 5., nan, nan, nan, nan,
nan, 2.])
In [104]: # Listed code ...
In [105]: count
Out[105]: array([ 6., 6., 6., 6., 5., 5., 5., 2., 2., 2., 2., 2., 2.])
```

Case #2 :

```
In [118]: count
Out[118]:
array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, nan, nan,
nan, 3.])
In [119]: # Listed code ...
In [120]: count
Out[120]: array([ 1., 1., 1., 1., 2., 2., 2., 3., 3., 3., 3., 3., 3.])
```

Case #3 :

```
In [114]: count
Out[114]:
array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, 3., nan,
nan])
In [115]: # Listed code ...
In [116]: count
Out[116]:
array([ 1., 1., 1., 1., 2., 2., 2., 3., 3., 3., nan,
nan])
```