Ravi Patel Ravi Patel - 1 year ago 104
C++ Question

int64_t pointer cast to AVX2 intrinsic _m256i

Hello I have a strange problem with AVX2 intrinsics. I create a pointer to a _m256i vector with a int64_t* cast. I then assign a value by dereferencing the pointer. The strange thing is that the value isn't observed in the vector variable, unless i run a few cout statements after it. The pointer and the vector have the same memory address and dereferencing the pointer produces the correct value, but the vector does not. What am I missing?

// Vector Variable
__m256i R_A0to3 = _mm256_set1_epi32(0xFFFFFFFF);

int64_t *ptr = NULL;
for(int m=0; m<4; m++){
// Cast pointer to vector type
ptr = (int64_t*)&R_A0to3;

cout<<"ptr_ADDRESS: "<<ptr<<endl;
cout<<"&R_A0to3_ADDRESS: "<<&R_A0to3<<endl;

// access
ptr[m] = (int64_t) m_array[m];

// generic function that prints out register
print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");
cout<<"m_array: "<< m_array[m]<<std::ends;

// Additional print statements
cout<<"ptr[m]: "<< ptr[m]<<std::endl;
cout<<"ptr[0]: "<< ptr[0]<<std::endl;
cout<<"ptr[1]: "<< ptr[1]<<std::endl;
cout<<"ptr[2]: "<< ptr[2]<<std::endl;
cout<<"ptr[3]: "<< ptr[3]<<std::endl;
print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");

ptr_ADDRESS 0x7ffd9313e880
&R_A0to3_ADDRESS 0x7ffd9313e880
m_array: 8
printing reg - R_C0to3 -1| -1| -1| -1|
printing reg - R_D0to3 -1| -1| -1| -1|

Output with Additional print statements:
ptr_ADDRESS 0x7ffd36359e20
&R_A0to3_ADDRESS 0x7ffd36359e20
printing reg - R_A0to3 -1| -1| -1| -1|
m_array: 8

ptr[0]: 8
ptr[1]: -1
ptr[2]: -1
ptr[3]: -1
printing reg - R_A0to3 8| -1| -1| -1|

Answer Source

I suggest using the _mm256_extract_epi64 and _mm256_insert_epi64 intrinsics when you need occasional access to individual elements. If you need to access all elements from the vector, consider using _mm256_store_si256 and _mm256_lddqu_si256 to store and load it. These intrinsics are less likely to rely on undefined behavior and they are transparent as to the machine instructions being generated (and thus as to the performance).