Hara Kikiri Hara Kikiri - 3 months ago 29
C Question

Optimizing code that loops through pixels

I got a 200x200 image, as a byte array of pixels (3 bytes for each pixel, representing RGB values). I'd like to select all border points, defined as a point that is not white, and lies either on the border of the image or has a neighbouring pixel of a different color.

Wrote simple C code for this :

int i = 0, row = 0, column = 0, width3 = width*3;
char r,g,b;
while (i < length) {

r = pixels[i], g = pixels[i+1], b = pixels[i+2];

if (r != -1 || g != -1 || b != -1) { // Not white
// Check for border point
if (column == 0 || column == width-1 || row == 0 || row == height-1
|| r != pixels[i-3] || r != pixels[i+3] || r != pixels[i-width3] || r != pixels[i+width3]
|| g != pixels[i-2] || g != pixels[i+4] || g != pixels[i-width3+1] || g != pixels[i+width3+1]
|| b != pixels[i-1] || b != pixels[i+5] || b != pixels[i-width3+2] || b != pixels[i+width3+2]) {
// Border point

i += 3;

if (++column == width) {
column = 0;
// printf("new row");


Now I'd like to know how I can speed this up as much as possible.
Either I could use the GPU but the transfer of memory from and to the GPU is quite costly.

Since I'm totally new to any kind of optimisation techniques such as those used in openCV, I'd like to know if there's any way to make my snippet faster.

(for more context ; I want to interpret border points of each non-white object on the image as 'contours' of a visible object and then use Douglas-Peucker to approximate the contours as a polygon)

Answer Source

A few micro-optimizations:

  • reorganize the loop on the rows to only access pairs of pixels wholly inside the image, so that you needn't test the column and row indexes;

  • do not test left and right: if two pixels differ, a single comparison suffices for both;

  • test only for white in case you detect a border point (they are only a fraction of the image area);

Your 12 comparisons test (to be reduced to 6) might be efficient, as it uses shortcut logics (so that all tests are performed only in uniform areas). You may try and trade it for a branchless expression, which will always be executed in full, but avoids costly conditional branches: use r0 - r1 | g0 - g1 | b0 - b1, which is only zero for identical colors.

Or even better, load whole pixels at a time as an integer value, computing the appropriate offset, xor them and mask out the extra byte: (*(unsigned int*)pixels ^ *(unsigned int*)(pixels + 3)) >> 8.

If this is not enough , you can consider to use the vector instruction set (SSE/AVX), but this is yet another story.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download