user2764478 user2764478 - 1 month ago 10x
C Question

Load 16 bit integers in AVX2 vector?

I am new to AVX programming. I want to load a

vector with 16 short int or 16 bit values, But I'm unable to do so.

Here is my attempt. It gives the following error:

incompatible types when initializing type β€˜__m256’ using type β€˜int’
__m256 result = _mm256_load_epi16((__m256*)&int_array);

#include <stdio.h>
#include <stdint.h>
#include <immintrin.h>

int main() {
int i;

short int int_array[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

__m256 result = _mm256_load_epi16((__m256*)&int_array);

short int* res = (short int*)&result;
printf("%d %d %d %d %d %d %d %d\n", res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);

return 0;

__m256i integer_vector = _mm256_load_si256((__m256*)int_array);

Three problems:

  • you ignored your compiler's implicit-declaration warning for _mm256_load_epi16, which doesn't exist. That's why it's complaining about initializing a __m256i from an int.
  • int_array is already a pointer to the first element. &int_array is a pointer to a pointer. You don't want to load that.
  • __m256 is a vector of 8 floats. You want __m256i. (The intrinsics distinguish between integer, float, and double vectors. This matches the asm instructions: using the result of an integer vector operation as an input to an FP vector operation (and vice versa) can cause extra bypass-delay latency. This stops you from casually / accidentally using an FP shuffle on integer data. It's still worth it sometimes, which is why functions like __m128 _mm_castsi128_ps(__m128i) exist.)

There aren't separate intrinsics for loads/stores with different integer element sizes. This is why you always have to write those annoying casts to (__m256i*). (AVX512 intrinsics will take void* args, a much better design IMO.)

Intel's intrinsics finder ( will help you find the functions you need. See also the tag wiki for guides, and the tag wiki has good stuff.

Third problem:

short int* res = (short int*)&result; is a bad idea. Don't alias pointers onto vectors. Aliasing vector-pointers onto arrays is ok, because __m256i is defined with a "may alias" attribute. But dereferencing (short int*)&result is C/C++ Undefined Behaviour, and won't do what you want (in theory or in practice).

Store to a temporary array, use _mm_extract_epi16, or use a union for type-punning.