dot dot dot dot dot dot - 1 month ago 9
C++ Question

Does std::vector<Simd_wrapper> have contiguous data in memory?

class Wrapper {
public:
// some functions operating on the value_
__m128i value_;
};

int main() {
std::vector<Wrapper> a;
a.resize(100);
}


Would the
value_
attribute of the
Wrapper
objects in the
vector a
always occupy contiguous memory without any gaps between the
__m128i values
?

I mean:

[128 bit for 1st Wrapper][no gap here][128bit for 2nd Wrapper] ...


So far, this seems to be true for g++ and the Intel cpu I am using, and gcc godbolt.

Since there is only a single __m128i attribute in the
Wrapper
object, does that mean the compiler always do not need to add any kind of padding in memory? (Memory layout of vector of POD objects)

Test code 1:

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
static constexpr size_t N = 1000;
std::vector<__m128i> a;
a.resize(1000);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i)
ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] = _mm_and_si128 (a[i], a[i-1]);
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}


Warning:

warning: ignoring attributes on template argument
'__m128i {aka __vector(2) long long int}'
[-Wignored-attributes]


Assembly (gcc god bolt):

.L9:
add rax, 16
movdqa xmm1, XMMWORD PTR [rax]
pand xmm0, xmm1
movaps XMMWORD PTR [rax-16], xmm0
cmp rax, rdx
movdqa xmm0, xmm1
jne .L9


I guess this means the data is contiguous because the loop just add 16 bytes to the memory address it reads in every cycle of the loop. It is using
pand
to do the bitwise and.

Test code 2:

#include <iostream>
#include <vector>
#include <x86intrin.h>
class Wrapper {
public:
__m128i value_;
inline Wrapper& operator &= (const Wrapper& rhs)
{
value_ = _mm_and_si128(value_, rhs.value_);
}
}; // Wrapper
int main()
{
static constexpr size_t N = 1000;
std::vector<Wrapper> a;
a.resize(N);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i) ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] &=a[i];
//std::cout << ptr_a[i];
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}


Assembly (gcc god bolt)

.L9:
add rdx, 2
add rax, 32
movdqa xmm1, XMMWORD PTR [rax-16]
pand xmm0, xmm1
movaps XMMWORD PTR [rax-32], xmm0
movdqa xmm0, XMMWORD PTR [rax]
pand xmm1, xmm0
movaps XMMWORD PTR [rax-16], xmm1
cmp rdx, 999
jne .L9


Looks like no padding too.
rax
increases by 32 in each step, and that is 2 x 16. That extra
add rdx,2
is definitely not as good as the loop from test code 1.

Test auto-vectorization

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
static constexpr size_t N = 1000;
std::vector<__m128i> a;
a.resize(1000);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i)
ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] = _mm_and_si128 (a[i], a[i-1]);
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}


Assembly (god bolt):

.L21:
movdqu xmm0, XMMWORD PTR [r10+rax]
add rdi, 1
pand xmm0, XMMWORD PTR [r8+rax]
movaps XMMWORD PTR [r8+rax], xmm0
add rax, 16
cmp rsi, rdi
ja .L21


... I just don't know if this is always true for intel cpu and g++/intel c++ compilers/(insert compiler name here) ...

Answer

No-padding is safe to assume in practice, unless you're compiling for a non-standard ABI.

All compilers targeting the same ABI must make the same choice about struct/class sizes / layouts, and all the standard ABIs / calling conventions will have no padding in your struct. (i.e. x86-32 and x86-64 System V and Windows, see the tag wiki for links). Your experiments with one compiler confirm it for all compilers targeting the same platform/ABI.

Note that the scope of this question is limited to x86 compilers that support Intel's intrinsics and the __m128i type, which means we have much stronger guarantees than what you get from just the ISO C++ standard without any implementation-specific stuff.


As @zneak points out, you can static_assert(std::is_standard_layout<Wrapper>::value) in the class def to remind people not to add any virtual methods, which would add a vtable pointer to each instance.