Benjamin Vedder Benjamin Vedder - 1 month ago 15
C Question

Portable way to serialize float as 32-bit integer

I have been struggling with finding a portable way to serialize 32-bit float variables in C and C++ to be sent to and from microcontrollers. I want the format to be well-defined enough so that serialization/de-serialization can be done from other languages as well without too much effort. Related questions are:

Portability of binary serialization of double/float type in C++

Portability of binary serialization of double/float type in C++

Serialize double and float with C

c++ portable conversion of long to double

I know that in most cases a typecast union/memcpy will work just fine because the float representation is the same, but I would prefer to have a bit more control and piece of mind. What I came up with so far is the following:

void serialize_float32(uint8_t* buffer, float number, int32_t *index) {
int e = 0;
float sig = frexpf(number, &e);
float sig_abs = fabsf(sig);
uint32_t sig_i = 0;

// Exponent range -126 to +127. This addition makes the bits look similar
// to actual floats on most system, so a direct memcpy can be done if you
// are lazy or know what you are doing. Otherwise, this manual encoding and
// decoding hopefully is portable and safe.
e += 126;

if (sig_abs > 0.0) {
sig_i = (uint32_t)((sig_abs - 0.5f) * 2.0f * 8388608.0f);
}

if (sig < 0) {
sig_i |= (1 << 23);
}

uint32_t res = ((e & 0xFF) << 23) | (sig_i & 0x7FFFFF);
if (sig < 0) {
res |= 1 << 31;
}

buffer[(*index)++] = (res >> 24) & 0xFF;
buffer[(*index)++] = (res >> 16) & 0xFF;
buffer[(*index)++] = (res >> 8) & 0xFF;
buffer[(*index)++] = res & 0xFF;
}


and

float deserialize_float32(const uint8_t *buffer, int32_t *index) {
uint32_t res = ((uint32_t) buffer[*index]) << 24 |
((uint32_t) buffer[*index + 1]) << 16 |
((uint32_t) buffer[*index + 2]) << 8 |
((uint32_t) buffer[*index + 3]);
*index += 4;

int e = (res >> 23) & 0xFF;
uint32_t sig_i = res & 0x7FFFFF;
bool neg = res & (1 << 31);
e -= 126;

float sig = 0.0;
if (e != 0 || sig_i != 0) {
sig = (float)sig_i / (8388608.0 * 2.0) + 0.5;
}

if (neg) {
sig = -sig;
}

return ldexpf(sig, e);
}


The frexp and ldexp functions seem to be made for this purpose, but in case they aren't available I tried to implement them manually as well using functions that are common:

float frexpf_slow(float f, int *e) {
if (f == 0.0) {
*e = 0;
return 0.0;
}

*e = ceil(log2f(fabsf(f)));
float res = f / powf(2.0, (float)*e);

// Make sure that the magnitude stays below 1 so that no overflow occurs
// during serialization. This seems to be required after doing some manual
// testing.

if (res >= 1.0) {
res -= 0.5;
*e += 1;
}

if (res <= -1.0) {
res += 0.5;
*e += 1;
}

return res;
}


and

float ldexpf_slow(float f, int e) {
return f * powf(2.0, (float)e);
}


One thing I have been considering is whether to use 8388608 (2^23) or 8388607 (2^23 - 1) as the multiplier. The documentation says that frexp returns values that are less than 1 in magnitude, and after some experimentation it seems that 8388608 gives results that are bit-accurate with actual floats and I could not find any corner case where this overflows. That might not be true with a different compiler/system though. If this can become a problem a smaller multiplier which reduces the accuracy a bit is fine with me as well. I know that this does not handle Inf or NaN, but for now that is not a requirement.

So, finally, my question is: Does this look like a reasonable approach, or am I just making a complicated solution that still has portability issues?

Answer

You seem to have a bug in serialize_float: the last 4 lines should read:

buffer[(*index)++] = (res >> 24) & 0xFF;
buffer[(*index)++] = (res >> 16) & 0xFF;
buffer[(*index)++] = (res >> 8) & 0xFF;
buffer[(*index)++] = res & 0xFF;

Your method might not work correctly for infinities and/or NaNs because of the offset by 126 instead of 128. Note that you can validate it by extensive testing: there are only 4 billion values, trying all possibilities should not take very long.

The actual representation in memory of float values may differ on different architectures, but IEEE 854 (or more precisely IEC 60559) is largely prevalent today. You can verify if your particular targets are compliant or not by checking if __STDC_IEC_559__ is defined. Note however that even if you can assume IEEE 854, you must handle potentially different endianness between the systems. You cannot assume the endianness of floats to be the same as that of integers for the same platform.

Note also that the simple cast would be incorrect: uint32_t res = *(uint32_t *)&number; violates the strict aliasing rule. You should either use a union or use memcpy(&res, &number, sizeof(res));