Mad Physicist - 1 year ago 113

Python Question

I have three 1D numpy arrays:

- A list of times at which some measurements occurred ().
`t`

- A list of measurements that occurred at each of the times in (
`t`

).`y`

- A (shorter) list of times for some some external changes that affected these measurements ().
`b`

Here is an example:

t = np.array([0.33856697,1.69615293,1.70257872, 2.32510279,

2.37788203, 2.45102176, 2.87518307, 3.60941650,

3.78275907,4.37970516, 4.56480259, 5.33306546,

6.00867792, 7.40217571, 7.46716989, 7.6791613 ,

7.96938078, 8.41620336,9.17116349,10.87530965])

y = np.array([ 3.70209916, 6.31148802, 2.96578172, 3.90036915, 5.11728629,

2.85788050, 4.50077811, 4.05113322, 3.55551093, 7.58624384,

5.47249362, 5.00286872, 6.26664832, 7.08640263, 5.28350628,

7.71646500, 3.75513591, 5.72849991, 5.60717179, 3.99436659])

b = np.array([ 1.7, 3.9, 9.5])

The elements of

`b`

`t`

I would like to apply an operation to each segment of

`y`

`b.size + 1`

`y`

I am currently using a for loop and slicing to apply my test:

`bias = 5`

categories = np.digitize(t, b)

result = np.empty(b.size + 1, dtype=np.bool_)

for i in range(result.size):

mask = (categories == i)

result[i] = (np.count_nonzero(y[mask] > bias) / np.count_nonzero(mask)) > 0.5

This seems extremely inefficient. Unfortunately,

`np.where`

`for`

By the way, here is a plot of

`y`

`t`

`bias`

`b`

`result`

`array([False, False, True, False], dtype=bool)`

Generated by

`from matplotlib import pyplot as plt`

from matplotlib.patches import Rectangle

plt.ion()

f, a = plt.subplots()

a.plot(t, y, label='y vs t')

a.hlines(5, *a.get_xlim(), label='bias')

plt.tight_layout()

a.set_xlim(0, 11)

c = np.concatenate([[0], b, [11]])

for i in range(len(c) - 1):

a.add_patch(Rectangle((c[i], 2.5), c[i+1] - c[i], 8 - 2.5, alpha=0.2, color=('red' if i % 2 else 'green'), zorder=-i-5))

a.legend()

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Shouldn't this produce the same result?

```
split_points = np.searchsorted(t, np.r_[t[0], b, t[-1]])
numerator = np.add.reduceat(y > bias, split_points[:-1])
denominator = np.diff(split_points)
result = (numerator / denominator) > 0.5
```

Few notes: This approach relies on t being sorted. Then the bins relative to b will all be neat blocks, so we need no mask to describe them but just the endpoints in form of indices into t. That's what `searchsorted`

finds for us.

Since your criterion doesn't appear to depend on group, we can make one big mask for all y in one go. Counting nonzeros in a boolean array is the same as summing, because the True's will be coerced to ones etc. The advantage in this case is that we can use `add.reduceat`

which takes the array, a list of split points and then sums the blocks between the splits, which is precisely what we want.

To normalise we need to count the total number in each bin, but because the bins are contiguous we just need the difference of the split_points delineating that bin, which is where we use `diff`

.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**