I am writing a computer vision library from scratch in Python to work with a
rpi
greyscale
img
model B
rpi3
sobel
def sobel(img):
xKernel = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
yKernel = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])
sobelled = np.zeros((img.shape[0]-2, img.shape[1]-2, 3), dtype="uint8")
for y in range(1, img.shape[0]-1):
for x in range(1, img.shape[1]-1):
gx = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], xKernel))
gy = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], yKernel))
g = abs(gx) + abs(gy) #math.sqrt(gx ** 2 + gy ** 2) (Slower)
g = g if g > 0 and g < 255 else (0 if g < 0 else 255)
sobelled[y-1][x-2] = g
return sobelled
greyscale
>15
squaring
adding
square rooting
gx
gy
sum
absolute
resolution
rpi
480x360
3280x2464
matrix convolutions
np.sum(np.multiply(...))
np.multiply
loops
numpy
C
3
matrix
Even though you're building out your own library, you really should absolutely use libraries for convolution, they will do the resulting operations in C or Fortran in the backend which will be much, much faster.
But to do it yourself if you'd like, use linear separable filters. Here's the idea:
Image:
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
Sobel x
kernel:
-1 0 1
-2 0 2
-1 0 1
Result:
8, 3, -7
At the first position of the convolution, you'll be computing 9 values. First off, why? You're never going to add the middle column, don't bother multiplying it. But that's not the point of linear separable filters. The idea is simple. When you place the kernel at the first position, you'll multiply the third column by [1, 2, 1]
. But then two steps later, you'll multiply the third column by [-1, -2, -1]
. What a waste! You've already calculated that, you just need to negate it now. And that's the idea with a linear separable filter. Notice that you can break up the filter into a matrix outer product of two vectors:
[1]
[2] * [-1, 0, 1]
[1]
Taking the outer product here yields the same matrix. So the idea here is to split the operation into two pieces. First multiply the whole image with the row vector, then the column vector. Taking the row vector
-1 0 1
across the image, we end up with
2 2 2
2 2 -3
2 -3 -3
And then passing the column vector through to be multiplied and summed, we again get
8, 3, -7
One other nifty trick that may or may not be helpful (depends on your tradeoffs between memory and efficiency):
Note that in the single row multiplication, you ignore the middle value, and just subtract the right from the left values. This means that effectively what you are doing is subtracting these two images:
3 4 5 1 2 3
4 5 1 - 2 3 4
5 1 2 3 4 5
If you cut the first two columns off of your image, you get the left matrix, and if you cut the last two columns off, you get the right matrix. So you can just compute this first part of the convolution simply as
result_h = img[:,2:] - img[:,:-2]
And then you can loop through for the remaining column of the sobel operator. Or, you can even proceed further and do the same thing we just did. This time for the vertical case, you simply need to add the first and third row and twice the second row; or, using numpy addition:
result_v = result_h[:-2] + result_h[2:] + 2*result_h[1:-1]
And you're done! I may add some timings here in the near-future. For some back of the envelope calculations (i.e. hasty Jupyter notebook timings on a 1000x1000 image):
new method (sums of the images): 8.18 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
old method (double for-loop):7.32 s ± 207 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Yes, you read that right: 1000x speedup.
Here's some code comparing the two:
import numpy as np
def sobel_x_orig(img):
xKernel = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
sobelled = np.zeros((img.shape[0]-2, img.shape[1]-2))
for y in range(1, img.shape[0]-1):
for x in range(1, img.shape[1]-1):
sobelled[y-1, x-1] = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], xKernel))
return sobelled
def sobel_x_new(img):
result_h = img[:,2:] - img[:,:-2]
result_v = result_h[:-2] + result_h[2:] + 2*result_h[1:-1]
return result_v
img = np.random.rand(1000, 1000)
sobel_new = sobel_x_new(img)
sobel_orig = sobel_x_orig(img)
assert (np.abs(sobel_new-sobel_orig) < 1e-12).all()
Of course, 1e-12
is some serious tolerance, but this is per element so it should be OK. But also I have a float
image, you'll of course have larger differences for uint8
images.
Note that you can do this for any linear separable filter! That includes Gaussian filters. Note also that in general, this requires a lot of operations. In C or Fortran or whatever, it's usually just implemented as two convolutions of the single row/column vectors because in the end, it needs to actually loop through every element of each matrix anyways; whether you're just adding them or multiplying them, so it's no faster in C to do it this way where you add the image values than if you just do the convolutions. But looping through numpy
arrays is super slow, so this method is way faster in Python.