underscore_d underscore_d - 2 months ago 17
C Question

union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

Background



Discussions on the mostly un-or-implementation-defined nature of type-punning via a
union
typically quote the following bits, here via @ecatmur ( http://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout
struct
s having a "common initial sequence" of member types:


C11 (6.5.2.3 Structure and union members; Semantics):


[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently
contains one of these structures, it is permitted to inspect the
common initial part of any of them anywhere that a declaration of
the completed type of the union is visible
. Two structures share a
common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or
more initial members.


C++03 ([class.mem]/16):


If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one
of these POD-structs, it is permitted to inspect the common initial
part of any of them. Two POD-structs share a common initial sequence
if corresponding members have layout-compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.


Other versions of the two standards have similar language; since C++11
the terminology used is standard-layout rather than POD.


Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to
union
member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.

But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of
union
type bit is totally absent in the corresponding section of any C++ standard.

@loop and @Mints97, here - http://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).

Standards discussions around this



[snipped - see my answer]

Questions



From this, then, my questions were:


  • What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?

  • Are we to assume that this omission in C++ is very deliberate?

  • What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?

  • If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?

  • What, if any, interesting ramifications does it have at compile- or runtime? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.




I'd imagine it permits more aggressive optimization; C can assume that
function arguments
S* s
and
T* t
do not alias even if they share a
common initial sequence as long as no
union { S; T; }
is in view,
while C++ can make that assumption only at link time. Might be worth
asking a separate question about that difference.


Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!

Answer

I've found my way through the labyrinth to some great sources on this, and I think that - thanks to efforts of far more perseverant people - I've finally got a proper summary of it. I'm posting this as an answer because it seems to explain both the intention of the C clause and C++'s omission thereof. This will evolve over time if I discover further supporting material for it.

Of course, I'll welcome clarifications/suggestions on how to improve this answer. Or if anyone has a better one, I'll accept that. If I've interpreted any of this wrongly, tell me! This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects.

Finally, some concrete commentary

Anyway, through vaguely related threads, I found @tab's following answer and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: http://stackoverflow.com/a/19807355

The GCC link contains some interesting discussion and reveals a sizeable amount of Committee and vendor confusion/conflicting interpretations - around union member structs, punning, and aliasing - spanning C and C++.

At the end of that, we're then linked to the main event - another BugZilla thread, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:

Origin of the added line in C99

C proposal N685 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n685.htm This is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types:

The proposed solution is to require that a union declaration be visible if aliases through a common initial sequence (like the above) are possible. Therefore the following TU provides this kind of aliasing if desired:

union utag {
    struct tag1 { int m1; double d2; } st1;
    struct tag2 { int m1; char c2; } st2;
};

int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
     pst2->m1 = 2;
     pst3->m1 = 0;   /* might be an alias for pst2->m1 */
     return pst2->m1;
}

Judging by the GCC discussion and comments below, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented - as evidenced by ecatmur. It's obvious how difficult this is to do without crippling many optimisations - for little benefit as few coders would want this guarantee (if I did, I'd just turn on fno-strict-aliasing). It's more likely to just catch people out and spuriously interact with other declarations of unions.

Omission of the line from C++

Following on from this and a comment I made elsewhere, @Potatoswatter - http://stackoverflow.com/a/19805106 - states that:

The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.

In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:

The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.

Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; refer to my final section on C. At present, it is not.

What does this mean for C++? (and, in practical terms, C implementations)

So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing - or at best implementation-defined. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:

C++ defect 1719: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:

In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union]) of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1 [dcl.type.cv]). —end note]

This clarifies the meaning of the previous wording: I read it as saying any specifically allowed 'punning' (reading inactive union member) of member structs with common initial sequences must be done via an instance of that union - rather than any vague concept of the union's type. This much clearer wording seems to rule out any other interpretation a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!

The upshot is that - as nicely demonstrated by @ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?

Possible reversal of this line in C / clarification in C++

It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:

Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.

Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."

Potatoswatter inspiringly concludes:

The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.

We can only hope!

Well, did I do it right? It's been quite a while since I wrote an essay, and I am extremely tired today (the two are at least partially related). All thoughts are welcome!

Comments