Todd Freed Todd Freed - 10 months ago 59
C Question

C overcoming aliasing restrictions (unions?)

Assume I have a sample source file, test.c, which I am compiling like so:

$ gcc -03 -Wall

test.c looks something like this ..

/// CMP128(x, y)
// arguments
// x - any pointer to an 128-bit int
// y - any pointer to an 128-bit int
// returns -1, 0, or 1 if x is less than, equal to, or greater than y
#define CMP128(x, y) // magic goes here

// example usages

uint8_t A[16];
uint16_t B[8];
uint32_t C[4];
uint64_t D[2];
struct in6_addr E;
uint8_t* F;

// use CMP128 on any combination of pointers to 128-bit ints, i.e.

CMP128(A, B);
CMP128(&C[0], &D[0]);
CMP128(&E, F);

// and so on

let's also say I accept the restriction that if you pass in two overlapping pointers, you get undefined results.

I've tried something like this (imagine these macros are properly formatted with backslash-escaped newlines at the end of each line)

#define CMP128(x, y) ({
uint64_t* a = (void*)x;
uint64_t* b = (void*)y;

// compare a[0] with b[0], a[1] with b[1]

but when I dereference a in the macro (a[0] < b[0]) I get "dereferencing breaks strict-aliasing rules" errors from gcc

I had thought that you were supposed to use unions to properly refer to a single place in memory in two different ways, so next I tried something like

#define CMP128(x, y) ({
union {
typeof(x) a;
typeof(y) b;
uint64_t* c;
} d = { .a = (x) }
, e = { .b = (y) };

// compare d.c[0] with e.c[0], etc

Except that I get the exact same errors from the compiler about strict-aliasing rules.

So: is there some way to do this without breaking strict-aliasing, short of actually COPYING the memory?

(may_alias doesnt count, it just allows you to bypass the strict-aliasing rules)

EDIT: use memcmp to do this. I got caught up on the aliasing rules and didn't think of it.

Answer Source

The compiler is correct as the aliasing rules are determined by the so-called 'effective type' of the object (ie memory location) you're accessing, regardless of any pointer magic. In this case, type-punning the pointers with a union is no different than an explicit cast - using the cast is actually preferable as the standard does not guarantee that arbitary pointer types have compatible representations, ie you're unnecessarily depending on implementation-defined behaviour.

If you want to conform to the standard, you need to copy the data to new variables or use a union during the declaration of the original variables.

If your 128-bit integers are either big-endian or little-endian (ie not mixed-endian), you could also use memcmp() (either directly or after negating the return value) or do a byte-wise comparison yourself: access through pointers of character type is an exception to the aliasing rule.