Zed Zed - 7 months ago 21
C Question

Alignment considerations and dynamic allocation

Say I have a structure with a 64-bit value. Naturally, on 64-bit systems, I would want to have that value aligned on a 64-bit boundary for fast read/writes.

When this structure is pushed on the stack, or stored in data section, I'm hoping the compiler will do it's best to align the structure such that this value is 64-bit aligned.

However, when overlaying this structure onto memory gotten from malloc and friends, I believe they don't guarantee any alignment - and as such, chances are my 64-bit value isn't aligned.

Even using my own allocators, or use aligned_malloc or alternatives, I'm unsure how to properly deal with this. I don't know how the compiler has chosen to pack my structure, and as such, don't know how to ensure alignment.

Ofc, there may be many 8, 16, 64, 128, ect. bit values in this structure - and would like to satisfy all of their alignment requirements.

Answer Source

When you dynamically allocate memory via malloc, it is guaranteed that the returned memory address will satisfy the target platform's minimum alignment requirements for all built-in types. This works for structs, too: alignment requirements are interpreted recursively, such that the struct's alignment is the largest alignment required for any of its members.

In effect, the language standard guarantees that your code will work correctly on the target platform. Which is a pretty reasonable guarantee, and means that you don't have to worry about any of this in the general case.

Quoting the (draft) C99 language standard, §7.20.3 ("Memory management functions"):

The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation- defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

(emphasis mine)

The only time you have to worry about alignment is if you want a stricter alignment than is required. For example, your platform might work just fine with 32-bit alignment, but perform better with 64-bit alignment. In this case, the language standards don't guarantee that 64-bit alignment will be used for dynamically-allocated memory, since 32-bit alignment is sufficient. If you want 64-bit alignment, you will either need to set a compiler option that enforces such alignment, or you will need to call something like aligned_malloc.

Virtually all implementations will by default add padding to the fields of a struct to keep each of them aligned. The compiler generally doesn't "pack" your struct unless you use some kind of implementation-specific option to request that it does so. However, all of this is compiler-dependent, so you will need to check your compiler's documentation to be sure of what it will do.

The most common case where people start worrying about alignment beyond that which is minimally guaranteed is when they are writing SIMD code, in particular, on x86, where 128-bit alignment often yields a performance improvement. However, if you are letting the compiler generate the SIMD code, it very likely already knows to do the necessary alignment (assuming you have specified the correct options for your target platform), and you don't need to do anything special. If you are using intrinsics to force the generation of certain instructions, then you should also be using the provided types, like __m128, which are already annotated to ensure the appropriate alignment. You can probably add similar annotations to your own typedefs, if you'd like to have them aligned to a stricter rule; again, see your compiler documentation for details on how to achieve this.

For example, this is what the documentation for Microsoft's C compiler has to say:

malloc is guaranteed to return memory that's suitably aligned for storing any object that has a fundamental alignment and that could fit in the amount of memory that's allocated. A fundamental alignment is an alignment that's less than or equal to the largest alignment that's supported by the implementation without an alignment specification. (In Visual C++, this is the alignment that's required for a double, or 8 bytes. In code that targets 64-bit platforms, it’s 16 bytes.) For example, a four-byte allocation would be aligned on a boundary that supports any four-byte or smaller object.

Visual C++ permits types that have extended alignment, which are also known as over-aligned types. For example, the SSE types __m128 and __m256, and types that are declared by using __declspec(align(n)) where n is greater than 8, have extended alignment. Memory alignment on a boundary that's suitable for an object that requires extended alignment is not guaranteed by malloc. To allocate memory for over-aligned types, use _aligned_malloc and related functions.