Heiko Bloch Heiko Bloch - 2 months ago 6
C++ Question

Array of non-contiguous objects

#include <iostream>
#include <cstring>
// This struct is not guaranteed to occupy contiguous storage
// in the sense of the C++ Object model (§1.8.5):
struct separated {
int i;
separated(int a, int b){i=a; i2=b;}
~separated(){i=i2=-1;} // nontrivial destructor --> not trivially copyable
private: int i2; // different access control --> not standard layout
};
int main() {
static_assert(not std::is_standard_layout<separated>::value,"sl");
static_assert(not std::is_trivial<separated>::value,"tr");
separated a[2]={{1,2},{3,4}};
std::memset(&a[0],0,sizeof(a[0]));
std::cout<<a[1].i;
// No guarantee that the previous line outputs 3.
}
// compiled with Debian clang version 3.5.0-10, C++14-standard
// (outputs 3)



  1. What is the rationale behind weakening standard guarantees to the point that this program may show undefined behaviour?

  2. The standard says:
    "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T."
    If objects of type T do not occupy contiguous storage, how can an array of such objects do?



edit: removed possibly distracting explanatory text

Answer

1. This is an instance of Occam's razor as adopted by the dragons that actually write compilers: Do not give more guarantees than needed to solve the problem, because otherwise your workload will double without compensation. Sophisticated classes adapted to fancy hardware or to historic hardware were part of the problem. (hinting by BaummitAugen and M.M)

2.

a) It is not that objects of type T either always or never occupy contiguous storage. There may be different memory layouts for the same type within a single binary. (experimental result, not derived from standard, not contradicting the standard)

b) 'contiguously allocated' or 'stored contiguously' may simply mean &a[n]==&a[0] + n (§23.3.2.1), which is a statement about subobject addresses that would not imply that the array resides within a single sequence of contiguous bytes. However, the standard is not very clear in this respect, and the other interpretation (element offset==sizeof(T)) is equally compatible with the wording. The latter interpretation would imply that one could force otherwise possibly non-contiguous objects into a contiguous layout by declaring them T t[1]; instead of T t;


Now why is this question relevant? As is discussed in Can technically objects occupy non-contiguous bytes of storage?, non-contiguous objects actually exist. Furthermore, memseting a subobject naively may invalidate the containing object, even for perfectly contiguous, trivially copyable objects:

#include <iostream>
#include <cstring>
struct A {
  private: int a;
  public: short i;
};
struct B :  A {
  short i;
};
int main()
{
   static_assert(std::is_trivial<A>::value , "A not trivial.");
   static_assert(not std::is_standard_layout<A>::value , "sl.");
   static_assert(std::is_trivial<B>::value , "B not trivial.");
   B object;
   object.i=1;
   std::cout<< object.B::i;
   std::memset((void*)&(A&)object ,0,sizeof(A));
   std::cout<<object.B::i;
}
// outputs 10 with g++/clang++, c++11, Debian 8, amd64     

Therefore, it is conceivable that the memset in the question post might zero a[1].i, such that the program would output 0 instead of 3.

There are few occasions where one would use memset-like functions with C++-objects at all. (Normally, destructors of subobjects will fail blatantly if you do that.) But sometimes one wishes to scrub the contents of an 'almost-POD'-class in its destructor, and this might be the exception.