David Bowling David Bowling - 28 days ago 10
C Question

Is it acceptable to cast from (int) to (unsigned)(intN_t) to preserve bit patterns?

Suppose that we define:

short x = -1;
unsigned short y = (unsigned short) x;


According to the C99 standard:


Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type. (ISO/IEC 9899:1999 6.3.1.3/2)


So, assuming two bytes for short and a two's complement representation, the bit patterns of these two integers are:

x = 1111 1111 1111 1111 (value of -1),
y = 1111 1111 1111 1111 (value of 65535).


Since -1 is not in the value range for unsigned short, and the maximum value that can be represented in an unsigned short is 65535, 65536 is added to -1 to get 65535, which is in the range of unsigned short. Thus the bits remain unchanged in casting from int to unsigned, though the represented value is changed.

But, the standard also says that representations may be two's complement, one's complement, or sign and magnitude. "Which of these applies is implementation-defined,...." (ISO/IEC 9899:1999 6.2.6.2/2)

On a system using one's complement,
x
would be represented as
1111 1111 1111 1110
before casting, and on a system using sign and magnitude representation,
x
would be represented as
1000 0000 0000 0001
. Both of these bit patterns represent a value of -1, which is not in the value range of unsigned short, so 65536 would be added to -1 in each case to bring the values into range. After the cast, both of these bit patterns would be
1111 1111 1111 1111
.

So, preservation of the bit pattern in casting from int to unsigned int is implementation dependent.

It seems like the ability to cast an int to unsigned int while preserving the bit pattern would be a handy tool for doing bit-shifting operations on negative numbers, and I have seen it advocated as a technique for just that. But this technique does not appear to be guaranteed to work by the standard.

The standard guarantees that the optional types intN_t be two's complement (ISO/IEC 9899:1999 7.18.1.1/1), so one solution would be to cast first to intN_t, and then to unsigned.

unsigned short y = (unsigned short)(int16_t) x;


Am I reading the standard correctly here, or am I misunderstanding something about the details of the conversion from signed to unsigned types? Are two's complement implementations prevalent enough that the assumption of bit-pattern preservation under casting from int to unsigned is reasonable? If not, is first casting from int to intN_t the best alternative?

Answer

If you specifically want to preserve bit-patterns above anything else, this seems like an excellent use case for going through a union rather than cast operators:

union S2US { short from; unsigned short to; };

...
short value = ...
unsigned short bits = (union S2US){ .from = value }.to;
...

As explained in footnote 95 (under section 6.5.2.3), "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6". Reinterpretation does not involve manipulating the data in any way, so depending on the member types, the extracted value is not guaranteed to have any direct arithmetic relationship to the inserted one, but it is guaranteed to have the exact same memory representation.

Since the sizes of the signed and unsigned versions of an integer type are the same (6.2.5 p6), and all members of a union must begin their storage at the same location (6.7.2.1 p16), a union that only contains signed and unsigned integers of the same width must copy all of the bits faithfully from one to the other, in either direction.

Comments