Kirill Lykov - 8 months ago 52

C Question

In SSE there is a function

`_mm_cvtepi32_ps(__m128i input)`

`_mm_cvtepu32_ps`

To illustrate the the difference in results:

`unsigned int a = 2480160505; // 10010011 11010100 00111110 11111001`

float a1 = a; // 01001111 00010011 11010100 00111111;

float a2 = (signed int)a; // 11001110 11011000 01010111 10000010

Answer Source

This functionality exists in AVX-512, but if you can't wait until then the only thing I can suggest is to convert the `unsigned int`

input values into pairs of smaller values, convert these, and then add them together again, e.g.

```
inline __m128 _mm_cvtepu32_ps(const __m128i v)
{
__m128i v2 = _mm_srli_epi32(v, 1); // v2 = v / 2
__m128i v1 = _mm_sub_epi32(v, v2); // v1 = v - (v / 2)
__m128 v2f = _mm_cvtepi32_ps(v2);
__m128 v1f = _mm_cvtepi32_ps(v1);
return _mm_add_ps(v2f, v1f);
}
```

**UPDATE**

As noted by @wim in his answer, the above solution fails for an input value of `UINT_MAX`

. Here is a more robust, but slightly less efficient solution, which should work for the full `uint32_t`

input range:

```
inline __m128 _mm_cvtepu32_ps(const __m128i v)
{
__m128i v2 = _mm_srli_epi32(v, 1); // v2 = v / 2
__m128i v1 = _mm_and_si128(v, _mm_set1_epi32(1)); // v1 = v & 1
__m128 v2f = _mm_cvtepi32_ps(v2);
__m128 v1f = _mm_cvtepi32_ps(v1);
return _mm_add_ps(_mm_add_ps(v2f, v2f), v1f); // return 2 * v2 + v1
}
```