Alex -4 years ago 121

Python Question

I wrote an adaptive color thresholding function in Python (because OpenCV's cv2.adaptiveThreshold didn't fit my needs) and it is way too slow. I've made it as efficient as I can, but it still takes almost 500 ms on a 1280x720 image.

I would greatly appreciate any suggestions that will make this function more efficient!

Here's what the function does: It uses a cross-shape of one-pixel thickness as the structuring element. For each pixel in the image, it computes the average values of

`ksize`

`ksize`

`C`

I compute those averages incrementally for all pixels at the same time using

`numpy.roll()`

`ksize`

`ksize`

Here is the code, the relevant part is really just what happens inside the for-loop:

`def bilateral_adaptive_threshold(img, ksize=20, C=0, mode='floor', true_value=255, false_value=0):`

mask = np.full(img.shape, false_value, dtype=np.int16)

left_thresh = np.zeros_like(img, dtype=np.float32) #Store the right-side average of each pixel here

right_thresh = np.zeros_like(img, dtype=np.float32) #Store the left-side average of each pixel here

up_thresh = np.zeros_like(img, dtype=np.float32) #Store the top-side average of each pixel here

down_thresh = np.zeros_like(img, dtype=np.float32) #Store the bottom-side average of each pixel here

for i in range(1, ksize+1):

roll_left = np.roll(img, -i, axis=1)

roll_right = np.roll(img, i, axis=1)

roll_up = np.roll(img, -i, axis=0)

roll_down = np.roll(img, i, axis=0)

roll_left[:,-i:] = 0

roll_right[:,:i] = 0

roll_up[-i:,:] = 0

roll_down[:i,:] = 0

left_thresh += roll_right

right_thresh += roll_left

up_thresh += roll_down

down_thresh += roll_up

left_thresh /= ksize

right_thresh /= ksize

up_thresh /= ksize

down_thresh /= ksize

if mode == 'floor':

mask[((img > left_thresh+C) & (img > right_thresh+C)) | ((img > up_thresh+C) & (img > down_thresh+C))] = true_value

elif mode == 'ceil':

mask[((img < left_thresh-C) & (img < right_thresh-C)) | ((img < up_thresh-C) & (img < down_thresh-C))] = true_value

else: raise ValueError("Unexpected mode value. Expected value is 'floor' or 'ceil'.")

return mask

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

As you hint in your question, the dominant part of the function is obtaining the 4 arrays of sums needed to calculate the averages -- here it's on average 190ms out of 210ms for the whole function. So, let's focus on that.

First, necessary imports and a convenience timing function.

```
from timeit import default_timer as timer
import numpy as np
import cv2
## ===========================================================================
def time_fn(fn, img, ksize=20, iters=16):
start = timer()
for i in range(iters):
fn(img, ksize)
end = timer()
return ((end - start) / iters) * 1000
## ===========================================================================
# Our test image
img = np.uint8(np.random.random((720,1280)) * 256)
```

We can reduce your function in the following manner, so that it just calculates and returns the 4 arrays of sums. We can later use this to check that the optimized versions return the same result.

```
# Original code
def windowed_sum_v1(img, ksize=20):
left_thresh = np.zeros_like(img, dtype=np.float32)
right_thresh = np.zeros_like(img, dtype=np.float32)
up_thresh = np.zeros_like(img, dtype=np.float32)
down_thresh = np.zeros_like(img, dtype=np.float32)
for i in range(1, ksize+1):
roll_left = np.roll(img, -i, axis=1)
roll_right = np.roll(img, i, axis=1)
roll_up = np.roll(img, -i, axis=0)
roll_down = np.roll(img, i, axis=0)
roll_left[:,-i:] = 0
roll_right[:,:i] = 0
roll_up[-i:,:] = 0
roll_down[:i,:] = 0
left_thresh += roll_right
right_thresh += roll_left
up_thresh += roll_down
down_thresh += roll_up
return (left_thresh, right_thresh, up_thresh, down_thresh)
```

Now we can find how much time this function takes on local machine:

```
>>> print "V1: %f ms" % time_fn(windowed_sum_v1, img, 20, 16)
V1: 188.572077 ms
```

`numpy.roll`

is bound to involve some overhead, but there's no need to dig into that here. Notice that after your roll the array, you zero out the rows or columns that spilled across the edge of the array. Then you add this to the accumulator. Adding a zero doesn't change the result, so we may as well avoid that. Instead, we can just add progressive smaller and appropriately offset slices of the whole array, avoiding the `roll`

and (somewhat) reducing the total number of additions needed.

```
# Summing up ROIs
def windowed_sum_v2(img, ksize=20):
h,w=(img.shape[0], img.shape[1])
left_thresh = np.zeros_like(img, dtype=np.float32)
right_thresh = np.zeros_like(img, dtype=np.float32)
up_thresh = np.zeros_like(img, dtype=np.float32)
down_thresh = np.zeros_like(img, dtype=np.float32)
for i in range(1, ksize+1):
left_thresh[:,i:] += img[:,:w-i]
right_thresh[:,:w-i] += img[:,i:]
up_thresh[i:,:] += img[:h-i,:]
down_thresh[:h-i,:] += img[i:,:]
return (left_thresh, right_thresh, up_thresh, down_thresh)
```

Let's test this and time it:

```
>>> print "Results equal (V1 vs V2): %s" % (np.array_equal(windowed_sum_v1(img), windowed_sum_v2(img)))
Results equal (V1 vs V2): True
>>> print "V2: %f ms" % time_fn(windowed_sum_v2, img, 20, 16)
V2: 110.861794 ms
```

This implementation takes only 60% of the time of taken by the original. Can we do better?

We still have a loop there. It would be nice if we could replace the repeated additions by a single call to some optimized function. One such function is `cv2.filter2D`

, which calculates the following:

We can create a kernel, such that the points we want to add have a weight of `1.0`

and the point that the kernel is anchored on has a weight of `0.0`

.

For example, when `ksize=8`

, we could use the following kernels and anchor positions.

The function would then be as follows:

```
# Using filter2d
def windowed_sum_v3(img, ksize=20):
kernel_l = np.array([[1.0] * (ksize) + [0.0]])
kernel_r = np.array([[0.0] + [1.0] * (ksize)])
kernel_u = np.array([[1.0]] * (ksize) + [[0.0]])
kernel_d = np.array([[0.0]] + [[1.0]] * (ksize))
left_thresh = cv2.filter2D(img, cv2.CV_32F, kernel_l, anchor=(ksize,0), borderType=cv2.BORDER_CONSTANT)
right_thresh = cv2.filter2D(img, cv2.CV_32F, kernel_r, anchor=(0,0), borderType=cv2.BORDER_CONSTANT)
up_thresh = cv2.filter2D(img, cv2.CV_32F, kernel_u, anchor=(0,ksize), borderType=cv2.BORDER_CONSTANT)
down_thresh = cv2.filter2D(img, cv2.CV_32F, kernel_d, anchor=(0,0), borderType=cv2.BORDER_CONSTANT)
return (left_thresh, right_thresh, up_thresh, down_thresh)
```

Again, let's test time this function:

```
>>> print "Results equal (V1 vs V3): %s" % (np.array_equal(windowed_sum_v1(img), windowed_sum_v3(img)))
Results equal (V1 vs V3): True
>>> print "V2: %f ms" % time_fn(windowed_sum_v3, img, 20, 16)
V3: 46.652996 ms
```

We're down to 25% of the original time.

Let's get back to the full thresholding function you wrote. Instead of dividing the sums as a separate step to obtain the average, we can scale the kernels such that `filter2D`

returns the mean directly. This is only a small improvement (~3%).

Similarly, you can replace the addition or subtraction of `C`

, by providing an appropriate `delta`

for the `filter2D`

call. This again shaves off a few percent.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**

Latest added