David Bowling David Bowling - 29 days ago 13
C Question

How can an (int) be converted to (unsigned int) while preserving the original bit pattern?

Suppose that we define:

short x = -1;
unsigned short y = (unsigned short) x;


According to the C99 standard:


Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type. (ISO/IEC 9899:1999 6.3.1.3/2)


So, assuming two bytes for short and a two's complement representation, the bit patterns of these two integers are:

x = 1111 1111 1111 1111 (value of -1),
y = 1111 1111 1111 1111 (value of 65535).


Since -1 is not in the value range for unsigned short, and the maximum value that can be represented in an unsigned short is 65535, 65536 is added to -1 to get 65535, which is in the range of unsigned short. Thus the bits remain unchanged in casting from int to unsigned, though the represented value is changed.

But, the standard also says that representations may be two's complement, one's complement, or sign and magnitude. "Which of these applies is implementation-defined,...." (ISO/IEC 9899:1999 6.2.6.2/2)

On a system using one's complement,
x
would be represented as
1111 1111 1111 1110
before casting, and on a system using sign and magnitude representation,
x
would be represented as
1000 0000 0000 0001
. Both of these bit patterns represent a value of -1, which is not in the value range of unsigned short, so 65536 would be added to -1 in each case to bring the values into range. After the cast, both of these bit patterns would be
1111 1111 1111 1111
.

So, preservation of the bit pattern in casting from int to unsigned int is implementation dependent.

It seems like the ability to cast an int to unsigned int while preserving the bit pattern would be a handy tool for doing bit-shifting operations on negative numbers, and I have seen it advocated as a technique for just that. But this technique does not appear to be guaranteed to work by the standard.

Am I reading the standard correctly here, or am I misunderstanding something about the details of the conversion from signed to unsigned types? Are two's complement implementations prevalent enough that the assumption of bit-pattern preservation under casting from int to unsigned is reasonable? If not, is there a better way to preserve bit patterns under a conversion from int to unsigned int?

Edit



My original goal was to find a way to cast an int to unsigned int in such a way that the bit pattern is preserved. I was thinking that a cast from int to intN_t could help accomplish this:

unsigned short y = (unsigned short)(int16_t) x;


but of course this idea was wrong! At best this would only enforce two's complement representation before casting to unsigned, so that the final bit pattern would be two's complement. I am tempted to just delete the question, yet I am still interested in ways to cast from int to unsigned int that preserve bit patterns, and @Leushenko has provided a really neat solution to this problem using unions. But, I have changed the title of the question to reflect the original intention, and I have edited the closing questions.

Answer

If you specifically want to preserve bit-patterns above anything else, this seems like an excellent use case for going through a union rather than cast operators:

union S2US { short from; unsigned short to; };

...
short value = ...
unsigned short bits = (union S2US){ .from = value }.to;
...

As explained in footnote 95 (under section 6.5.2.3), "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6". Reinterpretation does not involve manipulating the data in any way, so depending on the member types, the extracted value is not guaranteed to have any direct arithmetic relationship to the inserted one, but it is guaranteed to have the exact same memory representation.

Since the sizes of the signed and unsigned versions of an integer type are the same (6.2.5 p6), and all members of a union must begin their storage at the same location (6.7.2.1 p16), a union that only contains signed and unsigned integers of the same width must copy all of the bits faithfully from one to the other, in either direction.

Comments