fuz - 3 years ago 131

C Question

I have a struct of 8-bit pixel data:

`struct __attribute__((aligned(4))) pixels {`

char r;

char g;

char b;

char a;

}

I want to use SSE instructions to calculate certain things on these pixels (namely, a Paeth transformation). How can I load these pixels into an SSE register as 32-bits unsigned integers?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Ok, using SSE2 integer intrinsics from `<emmintrin.h>`

first load the thing into the lower 32 bits of the register:

```
__m128i xmm0 = _mm_cvtsi32_si128(*(const int*)&pixel);
```

Then first unpack those 8-bit values into 16-bit values in the lower 64 bits of the register, interleaving them with 0s:

```
xmm0 = _mm_unpacklo_epi8(xmm0, _mm_setzero_si128());
```

And again unpack those 16-bit values into 32-bit values:

```
xmm0 = _mm_unpacklo_epi16(xmm0, _mm_setzero_si128());
```

You should now have each pixel as 32-bit integer in the respective 4 components of the SSE register.

I just read, that you want to get those values as 32-bit **signed** integers, though I wonder what sense a signed pixel in [-127,127] makes. But if your pixel values can indeed be negative, the interleaving with zeros won't work, since it makes a negative 8-bit number into a positive 16-bit number (thus interprets your numbers as unsigned pixel values). A negative number has to be extended with `1`

s instead of `0`

s, but unfortunately that would have to be decided dynamically on a component by component basis, at which SSE is not that good.

What you could do is compare the values for negativity and use the resulting mask (which fortunately uses `1...1`

for true and `0...0`

for false) as interleavand, instead of the zero register:

```
xmm0 = _mm_unpacklo_epi8(xmm0, _mm_cmplt_epi8(xmm0, _mm_setzero_si128()));
xmm0 = _mm_unpacklo_epi16(xmm0, _mm_cmplt_epi16(xmm0, _mm_setzero_si128()));
```

This will properly extend negative numbers with `1`

s and positives with `0`

s. But of course this additional overhead (in the form of probably 2-4 additional SSE instructions) is only neccessary if your initial 8-bit pixel values can ever be negative, which I still doubt. But if this is really the case, you should rather consider `signed char`

over `char`

, as the latter has implementation-defined signedness (in the same way you should use `unsigned char`

if those are the common unsigned [0,255] pixel values).

Although, as clarified, you don't need signed-8-bit to 32-bit conversion, but for the sake of completeness *harold* had another very good idea for the SSE2-based sign-extension, instead of using the above mentioned comparison based version. We first unpack the 8-bit values into the upper byte of the 32-bit values instead of the lower byte. Since we don't care for the lower parts, we just use the 8-bit values again, which frees us from the need for an extra zero-register and an additional move:

```
xmm0 = _mm_unpacklo_epi8(xmm0, xmm0);
xmm0 = _mm_unpacklo_epi16(xmm0, xmm0);
```

Now we just need to perform and arithmetic right-shift of the upper byte into the lower byte, which does the proper sign-extension for negative values:

```
xmm0 = _mm_srai_epi32(xmm0, 24);
```

This should be more instruction count and register efficient than my above SSE2-version.

And as it should even be equal in instruction count for a single pixel (though 1 more instruction when amortized over many pixels) and more register efficient (due to no extra zero-register) compared to the above zero-extension, it might even be used for the unsigned-to-signed conversion if registers are rare, but then with a logical shift (`_mm_srli_epi32`

) instead of an arithmetic shift.

Thanks to *harold*'s comment, there is even a better option for the first 8-to-32 transformation. If you have SSE4 support (SSE4.1 to be precise), which has instructions for doing the complete conversion from 4 packed 8-bit values in the lower 32 bits of the register into 4 32-bit values in the whole register, both for signed and unsigned 8-bit values:

```
xmm0 = _mm_cvtepu8_epi32(xmm0); //or _mm_cvtepi8_epi32 for signed 8-bit values
```

As for the follow-up of reversing this transformation, first we pack the signed 32-bit integers into signed 16-bit integers and saturating:

```
xmm0 = _mm_packs_epi32(xmm0, xmm0);
```

Then we pack those 16-bit values into unsigned 8-bit values using saturation:

```
xmm0 = _mm_packus_epi16(xmm0, xmm0);
```

We can then finally take our pixel from the lower 32-bits of the register:

```
*(int*)&pixel = _mm_cvtsi128_si32(xmm0);
```

Due to the saturation, this whole process will autmatically map any negative values to `0`

and any values greater than `255`

to `255`

, which is usually intended when working with color pixels.

If you actually need truncation instead of saturation when packing the 32-bit values back into `unsigned char`

s, then you will need to do this yourself, since SSE only provides saturating packing instructions. But this can be achieved by doing a simple:

```
xmm0 = _mm_and_si128(xmm0, _mm_set1_epi32(0xFF));
```

right before the above packing procedure. This should amount to just 2 additional SSE instructions, or only 1 additional instruction when amortized over many pixels.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**